All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] fuse: handle release synchronously (v4)
@ 2014-09-25 12:05 Maxim Patlasov
  2014-09-25 12:05 ` [PATCH 1/5] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags Maxim Patlasov
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Maxim Patlasov @ 2014-09-25 12:05 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

Hi,

There is a long-standing demand for synchronous behaviour of fuse_release:

http://sourceforge.net/mailarchive/message.php?msg_id=19343889
http://sourceforge.net/mailarchive/message.php?msg_id=29814693

A year ago Avati and me explained why such a feature would be useful:

http://sourceforge.net/mailarchive/message.php?msg_id=29889055
http://sourceforge.net/mailarchive/message.php?msg_id=29867423

In short, the problem is that fuse_release (that's usually called on last
user close(2)) sends FUSE_RELEASE to userspace and returns without waiting
for ACK from userspace. Consequently, there is a gap when user regards the
file released while userspace fuse is still working on it. An attempt to
access the file from another node leads to complicated synchronization
problems because the first node still "holds" the file.

The patch-set resolves the problem by making fuse_release synchronous:
wait for ACK from userspace for FUSE_RELEASE if the feature is ON.

It must be emphasized that even if the feature is enabled (i.e. fuse_release
is synchronous), nothing guarantees that fuse_release() will be called
in the context of close(2). In fact, it may happen later, on last fput().
However, there are still a lot of use cases when close(2) is synchronous,
so the feature must be regarded as an optimization maximizing chances of
synchronous behaviour of close(2).

To keep single-threaded userspace implementations happy the patch-set
ensures that by the time fuse_release_common calls fuse_file_put, no
more in-flight I/O exists. Asynchronous fuse callbacks (like
fuse_readpages_end) cannot trigger FUSE_RELEASE anymore. Hence, we'll
never block in contexts other than close().

Changed in v2:
 - improved comments, commented spin_unlock_wait out according to Brian'
suggestions.
 - rebased on v3.15-rc8 tag of Linus' tree.
Changed in v3:
 - changed patch-set title from "close file synchronously" to "handle
   release synchronously"
 - renamed CLOSE_WAIT to SYNC_RELEASE to convey the essence of the feature
   ("synchronous release" instead of "wait on close")
 - enabled feature on per file basis (instead of global fuse_conn parameter)
 - added "disable_sync_release" mount option
 - rebased on v3.17-rc1 tag of Linus' tree.
Changed in v4:
 - removed redundant locking around __fuse_file_put()
 - removed a patch making FUSE_RELEASE request uninterruptible. As Miklos
   pointed out, if the request is interrupted, then we should possibly allow
   that, effectively backgrounding the request. However, such a backgrounding
   is not easy to implement without breaking locking expectations of the
   userspace filesystem. Given the problem has existed for fuseblk mount for
   long time and there is no reasonable solution now, I put it aside for the
   future.

Thanks,
Maxim

---

Maxim Patlasov (5):
      fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags
      fuse: cosmetic rework of fuse_send_readpages
      fuse: wait for end of IO on release
      fuse: add mount option to disable synchronous release
      fuse: enable close_wait synchronous release


 fs/fuse/file.c            |   97 ++++++++++++++++++++++++++++++++++++++-------
 fs/fuse/fuse_i.h          |    3 +
 fs/fuse/inode.c           |    8 ++++
 include/uapi/linux/fuse.h |    3 +
 4 files changed, 95 insertions(+), 16 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/5] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags
  2014-09-25 12:05 [PATCH 0/5] fuse: handle release synchronously (v4) Maxim Patlasov
@ 2014-09-25 12:05 ` Maxim Patlasov
  2014-09-25 12:05 ` [PATCH 2/5] fuse: cosmetic rework of fuse_send_readpages Maxim Patlasov
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Maxim Patlasov @ 2014-09-25 12:05 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

The feature will be governed by fuse file open flag FOPEN_SYNC_RELEASE.
Userspace can enable it on per file basis in the same way as for
FOPEN_KEEP_CACHE or FOPEN_DIRECT_IO.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 include/uapi/linux/fuse.h |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 25084a0..607c45c 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -205,10 +205,13 @@ struct fuse_file_lock {
  * FOPEN_DIRECT_IO: bypass page cache for this open file
  * FOPEN_KEEP_CACHE: don't invalidate the data cache on open
  * FOPEN_NONSEEKABLE: the file is not seekable
+ * FOPEN_SYNC_RELEASE: synchronously release file on last fput,
+ *                     which, in turn, not always bound to fclose(2)!
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
 #define FOPEN_NONSEEKABLE	(1 << 2)
+#define FOPEN_SYNC_RELEASE	(1 << 3)
 
 /**
  * INIT request/reply flags


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/5] fuse: cosmetic rework of fuse_send_readpages
  2014-09-25 12:05 [PATCH 0/5] fuse: handle release synchronously (v4) Maxim Patlasov
  2014-09-25 12:05 ` [PATCH 1/5] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags Maxim Patlasov
@ 2014-09-25 12:05 ` Maxim Patlasov
  2014-09-25 12:06 ` [PATCH 3/5] fuse: wait for end of IO on release Maxim Patlasov
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Maxim Patlasov @ 2014-09-25 12:05 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

The patch change arguments of fuse_send_readpages to give it access to inode
(will be used in the next patch of patch-set). The change is cosmetic,
no logic changed.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 912061a..7723b3f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -827,8 +827,17 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
 		fuse_file_put(req->ff, false);
 }
 
-static void fuse_send_readpages(struct fuse_req *req, struct file *file)
+struct fuse_fill_data {
+	struct fuse_req *req;
+	struct file *file;
+	struct inode *inode;
+	unsigned nr_pages;
+};
+
+static void fuse_send_readpages(struct fuse_fill_data *data)
 {
+	struct fuse_req *req = data->req;
+	struct file *file = data->file;
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = ff->fc;
 	loff_t pos = page_offset(req->pages[0]);
@@ -850,13 +859,6 @@ static void fuse_send_readpages(struct fuse_req *req, struct file *file)
 	}
 }
 
-struct fuse_fill_data {
-	struct fuse_req *req;
-	struct file *file;
-	struct inode *inode;
-	unsigned nr_pages;
-};
-
 static int fuse_readpages_fill(void *_data, struct page *page)
 {
 	struct fuse_fill_data *data = _data;
@@ -872,7 +874,7 @@ static int fuse_readpages_fill(void *_data, struct page *page)
 	     req->pages[req->num_pages - 1]->index + 1 != page->index)) {
 		int nr_alloc = min_t(unsigned, data->nr_pages,
 				     FUSE_MAX_PAGES_PER_REQ);
-		fuse_send_readpages(req, data->file);
+		fuse_send_readpages(data);
 		if (fc->async_read)
 			req = fuse_get_req_for_background(fc, nr_alloc);
 		else
@@ -925,7 +927,7 @@ static int fuse_readpages(struct file *file, struct address_space *mapping,
 	err = read_cache_pages(mapping, pages, fuse_readpages_fill, &data);
 	if (!err) {
 		if (data.req->num_pages)
-			fuse_send_readpages(data.req, file);
+			fuse_send_readpages(&data);
 		else
 			fuse_put_request(fc, data.req);
 	}


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/5] fuse: wait for end of IO on release
  2014-09-25 12:05 [PATCH 0/5] fuse: handle release synchronously (v4) Maxim Patlasov
  2014-09-25 12:05 ` [PATCH 1/5] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags Maxim Patlasov
  2014-09-25 12:05 ` [PATCH 2/5] fuse: cosmetic rework of fuse_send_readpages Maxim Patlasov
@ 2014-09-25 12:06 ` Maxim Patlasov
  2014-09-25 12:06 ` [PATCH 4/5] fuse: add mount option to disable synchronous release Maxim Patlasov
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Maxim Patlasov @ 2014-09-25 12:06 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

There are two types of I/O activity that can be "in progress" at the time
of fuse_release() execution: asynchronous read-ahead and write-back. The
patch ensures that they are completed before fuse_release_common sends
FUSE_RELEASE to userspace.

So far as fuse_release() waits for end of async I/O, its callbacks
(fuse_readpages_end and fuse_writepage_finish) calling fuse_file_put cannot
be the last holders of fuse file anymore. To emphasize the fact, the patch
replaces fuse_file_put with __fuse_file_put there.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7723b3f..8713e62 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -149,6 +149,17 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
 	}
 }
 
+/*
+ * Asynchronous callbacks may use it instead of fuse_file_put() because
+ * we guarantee that they are never last holders of ff. Hitting BUG() below
+ * will make clear any violation of the guarantee.
+ */
+static void __fuse_file_put(struct fuse_file *ff)
+{
+	if (atomic_dec_and_test(&ff->count))
+		BUG();
+}
+
 int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
 		 bool isdir)
 {
@@ -279,6 +290,11 @@ static void fuse_prepare_release(struct fuse_file *ff, int flags, int opcode)
 	req->in.args[0].value = inarg;
 }
 
+static bool must_release_synchronously(struct fuse_file *ff)
+{
+	return ff->open_flags & FOPEN_SYNC_RELEASE;
+}
+
 void fuse_release_common(struct file *file, int opcode)
 {
 	struct fuse_file *ff;
@@ -302,6 +318,13 @@ void fuse_release_common(struct file *file, int opcode)
 	req->misc.release.path = file->f_path;
 
 	/*
+	 * No more in-flight asynchronous READ or WRITE requests if
+	 * fuse file release is synchronous
+	 */
+	if (must_release_synchronously(ff))
+		BUG_ON(atomic_read(&ff->count) != 1);
+
+	/*
 	 * Normally this will send the RELEASE request, however if
 	 * some asynchronous READ or WRITE requests are outstanding,
 	 * the sending will be delayed.
@@ -321,11 +344,34 @@ static int fuse_open(struct inode *inode, struct file *file)
 static int fuse_release(struct inode *inode, struct file *file)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_file *ff = file->private_data;
 
 	/* see fuse_vma_close() for !writeback_cache case */
 	if (fc->writeback_cache)
 		write_inode_now(inode, 1);
 
+	if (must_release_synchronously(ff)) {
+		struct fuse_inode *fi = get_fuse_inode(inode);
+
+		/*
+		 * Must remove file from write list. Otherwise it is possible
+		 * this file will get more writeback from another files
+		 * rerouted via write_files.
+		 */
+		spin_lock(&ff->fc->lock);
+		list_del_init(&ff->write_entry);
+		spin_unlock(&ff->fc->lock);
+
+		wait_event(fi->page_waitq, atomic_read(&ff->count) == 1);
+
+		/*
+		 * spin_unlock_wait(&ff->fc->lock) would be natural here to
+		 * wait for threads just released ff to leave their critical
+		 * sections. But taking spinlock is the first thing
+		 * fuse_release_common does, so that this is unnecessary.
+		 */
+	}
+
 	fuse_release_common(file, FUSE_RELEASE);
 
 	/* return value is ignored by VFS */
@@ -823,8 +869,15 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
 		unlock_page(page);
 		page_cache_release(page);
 	}
-	if (req->ff)
-		fuse_file_put(req->ff, false);
+	if (req->ff) {
+		if (must_release_synchronously(req->ff)) {
+			struct fuse_inode *fi = get_fuse_inode(req->inode);
+
+			__fuse_file_put(req->ff);
+			wake_up(&fi->page_waitq);
+		} else
+			fuse_file_put(req->ff, false);
+	}
 }
 
 struct fuse_fill_data {
@@ -851,6 +904,7 @@ static void fuse_send_readpages(struct fuse_fill_data *data)
 	if (fc->async_read) {
 		req->ff = fuse_file_get(ff);
 		req->end = fuse_readpages_end;
+		req->inode = data->inode;
 		fuse_request_send_background(fc, req);
 	} else {
 		fuse_request_send(fc, req);
@@ -1502,7 +1556,7 @@ static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req)
 	for (i = 0; i < req->num_pages; i++)
 		__free_page(req->pages[i]);
 
-	if (req->ff)
+	if (req->ff && !must_release_synchronously(req->ff))
 		fuse_file_put(req->ff, false);
 }
 
@@ -1519,6 +1573,8 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
 		dec_zone_page_state(req->pages[i], NR_WRITEBACK_TEMP);
 		bdi_writeout_inc(bdi);
 	}
+	if (must_release_synchronously(req->ff))
+		__fuse_file_put(req->ff);
 	wake_up(&fi->page_waitq);
 }
 
@@ -1659,8 +1715,13 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
 
 	ff = __fuse_write_file_get(fc, fi);
 	err = fuse_flush_times(inode, ff);
-	if (ff)
-		fuse_file_put(ff, 0);
+	if (ff) {
+		if (must_release_synchronously(ff)) {
+			__fuse_file_put(ff);
+			wake_up(&fi->page_waitq);
+		} else
+			fuse_file_put(ff, false);
+	}
 
 	return err;
 }


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/5] fuse: add mount option to disable synchronous release
  2014-09-25 12:05 [PATCH 0/5] fuse: handle release synchronously (v4) Maxim Patlasov
                   ` (2 preceding siblings ...)
  2014-09-25 12:06 ` [PATCH 3/5] fuse: wait for end of IO on release Maxim Patlasov
@ 2014-09-25 12:06 ` Maxim Patlasov
  2014-09-25 12:07 ` [PATCH 5/5] fuse: enable close_wait " Maxim Patlasov
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Maxim Patlasov @ 2014-09-25 12:06 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

Synchronous release ensures that kernel fuse reports to userspace exactly
last fput(). However, nothing guarantees that last fput() will happen in the
context of close(2). So userspace applications must be prepared to the
situation when close(2) returned, but processing of FUSE_RELEASE is not
completed by fuse daemon yet. To make testing such use cases easier, the
patch introduces DISABLE_SYNC_RELEASE mount option which effectively mask out
FOPEN_SYNC_RELEASE flag.

Also, the mount option can be used to protect from DoS scenarios with a sync
release: fusermount can be instrumented to use the option by default for
unprivileged mounts (allowing system administrator to configure it like
"user_allow_other").

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c   |    3 ++-
 fs/fuse/fuse_i.h |    3 +++
 fs/fuse/inode.c  |    8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 8713e62..c4599c8 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -292,7 +292,8 @@ static void fuse_prepare_release(struct fuse_file *ff, int flags, int opcode)
 
 static bool must_release_synchronously(struct fuse_file *ff)
 {
-	return ff->open_flags & FOPEN_SYNC_RELEASE;
+	return ff->open_flags & FOPEN_SYNC_RELEASE &&
+		!(ff->fc->flags & FUSE_DISABLE_SYNC_RELEASE);
 }
 
 void fuse_release_common(struct file *file, int opcode)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e8e47a6..c5e2fca 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -44,6 +44,9 @@
     doing the mount will be allowed to access the filesystem */
 #define FUSE_ALLOW_OTHER         (1 << 1)
 
+/** Disable synchronous release */
+#define FUSE_DISABLE_SYNC_RELEASE (1 << 2)
+
 /** Number of page pointers embedded in fuse_req */
 #define FUSE_REQ_INLINE_PAGES 1
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 03246cd..86d47d0 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -463,6 +463,7 @@ enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_DISABLE_SYNC_RELEASE,
 	OPT_ERR
 };
 
@@ -475,6 +476,7 @@ static const match_table_t tokens = {
 	{OPT_ALLOW_OTHER,		"allow_other"},
 	{OPT_MAX_READ,			"max_read=%u"},
 	{OPT_BLKSIZE,			"blksize=%u"},
+	{OPT_DISABLE_SYNC_RELEASE,	"disable_sync_release"},
 	{OPT_ERR,			NULL}
 };
 
@@ -560,6 +562,10 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 			d->blksize = value;
 			break;
 
+		case OPT_DISABLE_SYNC_RELEASE:
+			d->flags |= FUSE_DISABLE_SYNC_RELEASE;
+			break;
+
 		default:
 			return 0;
 		}
@@ -583,6 +589,8 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 		seq_puts(m, ",default_permissions");
 	if (fc->flags & FUSE_ALLOW_OTHER)
 		seq_puts(m, ",allow_other");
+	if (fc->flags & FUSE_DISABLE_SYNC_RELEASE)
+		seq_puts(m, ",disable_sync_release");
 	if (fc->max_read != ~0)
 		seq_printf(m, ",max_read=%u", fc->max_read);
 	if (sb->s_bdev && sb->s_blocksize != FUSE_DEFAULT_BLKSIZE)


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/5] fuse: enable close_wait synchronous release
  2014-09-25 12:05 [PATCH 0/5] fuse: handle release synchronously (v4) Maxim Patlasov
                   ` (3 preceding siblings ...)
  2014-09-25 12:06 ` [PATCH 4/5] fuse: add mount option to disable synchronous release Maxim Patlasov
@ 2014-09-25 12:07 ` Maxim Patlasov
  2014-09-26 15:28 ` [PATCH 0/5] fuse: handle release synchronously (v4) Miklos Szeredi
  2014-09-30  3:55 ` Linus Torvalds
  6 siblings, 0 replies; 33+ messages in thread
From: Maxim Patlasov @ 2014-09-25 12:07 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

The patch enables the feature by passing 'true' to fuse_file_put in
fuse_release_common.

Previously, this was safe only in special cases when we sure that
multi-threaded userspace won't deadlock if we'll synchronously send
FUSE_RELEASE in the context of read-ahead or write-back callback. Now, it's
always safe because callbacks don't send requests to userspace anymore.

The feature can be made privileged by means of DISABLE_SYNC_RELEASE mount
option implemented by the previous patch.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index c4599c8..858280f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -334,7 +334,8 @@ void fuse_release_common(struct file *file, int opcode)
 	 * synchronous RELEASE is allowed (and desirable) in this case
 	 * because the server can be trusted not to screw up.
 	 */
-	fuse_file_put(ff, ff->fc->destroy_req != NULL);
+	fuse_file_put(ff, ff->fc->destroy_req != NULL ||
+			  must_release_synchronously(ff));
 }
 
 static int fuse_open(struct inode *inode, struct file *file)


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-09-25 12:05 [PATCH 0/5] fuse: handle release synchronously (v4) Maxim Patlasov
                   ` (4 preceding siblings ...)
  2014-09-25 12:07 ` [PATCH 5/5] fuse: enable close_wait " Maxim Patlasov
@ 2014-09-26 15:28 ` Miklos Szeredi
  2014-09-30  3:15   ` Miklos Szeredi
  2014-09-30  3:55 ` Linus Torvalds
  6 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2014-09-26 15:28 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: fuse-devel, Anand Avati, Kernel Mailing List, Linux-Fsdevel,
	Al Viro, Linus Torvalds

[Adding CC's]

On Thu, Sep 25, 2014 at 2:05 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:

> In short, the problem is that fuse_release (that's usually called on last
> user close(2)) sends FUSE_RELEASE to userspace and returns without waiting
> for ACK from userspace. Consequently, there is a gap when user regards the
> file released while userspace fuse is still working on it. An attempt to
> access the file from another node leads to complicated synchronization
> problems because the first node still "holds" the file.
>
> The patch-set resolves the problem by making fuse_release synchronous:
> wait for ACK from userspace for FUSE_RELEASE if the feature is ON.
>
> It must be emphasized that even if the feature is enabled (i.e. fuse_release
> is synchronous), nothing guarantees that fuse_release() will be called
> in the context of close(2). In fact, it may happen later, on last fput().
> However, there are still a lot of use cases when close(2) is synchronous,
> so the feature must be regarded as an optimization maximizing chances of
> synchronous behaviour of close(2).

Okay, we have the common case of close -> last-fput ->release.  This
being synchronous is fine.  Related cases are munmap(), and weird
corner cases including I/O on file descriptor being done in parallel
with close() in another thread.  Synchronous behavior is also fine in
these cases, since the task calling the last fput() is in fact
responsible for releasing the file.

And then we have the uncommon cases when fput() is called from
something unrelated.  Take the following DoS example:  malicious app
creates a socket loop (sockA is sent over sockB and sockB is sent over
sockA) and in addition it tacks a fuse fd onto one of the sockets.
The fuse fd is implemented to block forever on release.  In this case
the loop will persist after the sockets are closed due to refcounting.
Later, a garbage collection is triggered from a completely unrelated
socket operation.   This result in the unrelated task being blocked
forever.

The simplest way to avoid such an attack is to make the sync-release
feature privileged.  But even if it's privileged, the fact that
->release can take a lot of time and a completely unrelated task could
be waiting for it to finish is not a good thing.

So I'm wondering: could we have some way of distinguishing "good
release" from "bad release"?  Maybe adding an fput_sync() variant that
passes an "sync" flag to ->release()?

Al, Linus?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-09-26 15:28 ` [PATCH 0/5] fuse: handle release synchronously (v4) Miklos Szeredi
@ 2014-09-30  3:15   ` Miklos Szeredi
  0 siblings, 0 replies; 33+ messages in thread
From: Miklos Szeredi @ 2014-09-30  3:15 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: fuse-devel, Anand Avati, Kernel Mailing List, Linux-Fsdevel,
	Al Viro, Linus Torvalds

On Fri, Sep 26, 2014 at 5:28 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> [Adding CC's]
>
> On Thu, Sep 25, 2014 at 2:05 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
>
>> In short, the problem is that fuse_release (that's usually called on last
>> user close(2)) sends FUSE_RELEASE to userspace and returns without waiting
>> for ACK from userspace. Consequently, there is a gap when user regards the
>> file released while userspace fuse is still working on it. An attempt to
>> access the file from another node leads to complicated synchronization
>> problems because the first node still "holds" the file.
>>
>> The patch-set resolves the problem by making fuse_release synchronous:
>> wait for ACK from userspace for FUSE_RELEASE if the feature is ON.
>>
>> It must be emphasized that even if the feature is enabled (i.e. fuse_release
>> is synchronous), nothing guarantees that fuse_release() will be called
>> in the context of close(2). In fact, it may happen later, on last fput().
>> However, there are still a lot of use cases when close(2) is synchronous,
>> so the feature must be regarded as an optimization maximizing chances of
>> synchronous behaviour of close(2).
>
> Okay, we have the common case of close -> last-fput ->release.  This
> being synchronous is fine.  Related cases are munmap(), and weird
> corner cases including I/O on file descriptor being done in parallel
> with close() in another thread.  Synchronous behavior is also fine in
> these cases, since the task calling the last fput() is in fact
> responsible for releasing the file.
>
> And then we have the uncommon cases when fput() is called from
> something unrelated.  Take the following DoS example:  malicious app
> creates a socket loop (sockA is sent over sockB and sockB is sent over
> sockA) and in addition it tacks a fuse fd onto one of the sockets.
> The fuse fd is implemented to block forever on release.  In this case
> the loop will persist after the sockets are closed due to refcounting.
> Later, a garbage collection is triggered from a completely unrelated
> socket operation.   This result in the unrelated task being blocked
> forever.
>
> The simplest way to avoid such an attack is to make the sync-release
> feature privileged.  But even if it's privileged, the fact that
> ->release can take a lot of time and a completely unrelated task could
> be waiting for it to finish is not a good thing.
>
> So I'm wondering: could we have some way of distinguishing "good
> release" from "bad release"?  Maybe adding an fput_sync() variant that
> passes an "sync" flag to ->release()?

If we just concentrate on close(2), then there's a much simpler
solution: in fuse_flush() check if file_count(file) is 1.  If it is,
then this is a final close and we can mark the FLUSH request as such
with a flag (FUSE_FLUSH_FINAL).  Userspace filesystem can act
accordingly and may even return an error value to close(2).  In this
case we could even omit the RELEASE request as an optimization, since
we know the file cannot be resurrected if this was the last reference.

Do we need to be synchronous in any other case?  E.g. munmap() comes to mind.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-09-25 12:05 [PATCH 0/5] fuse: handle release synchronously (v4) Maxim Patlasov
                   ` (5 preceding siblings ...)
  2014-09-26 15:28 ` [PATCH 0/5] fuse: handle release synchronously (v4) Miklos Szeredi
@ 2014-09-30  3:55 ` Linus Torvalds
       [not found]   ` <CAFboF2yhGyjk4e_CHQV5b2WvB-QhsWNyHvFiFG_OM_=3-KArLQ@mail.gmail.com>
  6 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2014-09-30  3:55 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: Miklos Szeredi, open list:FUSE: FILESYSTEM...,
	avati, Linux Kernel Mailing List

On Thu, Sep 25, 2014 at 5:05 AM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
>
> There is a long-standing demand for synchronous behaviour of fuse_release:

That's just complete bullshit.

The fact is, release() is not synchronous. End of story.

If you want to catch "close()" synchronously, you use flush(). The two
are NOT the same, and never will be, and never should be confused.

"release()" happens at some random time (but only once), and cannot
return an error.

"flush()" happens synchronously at close() time, and can return an error.

Anybody who confuses the two is *wrong*.

It sounds like somebody wants to use "flush()" in fuse.  But please
don't mistake that for "release". It's different.

So please kill this "FOPEN_SYNC_RELEASE" thing with fire. It's crazy,
it's wrong, it's stupid. It must die.

               Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
       [not found]   ` <CAFboF2yhGyjk4e_CHQV5b2WvB-QhsWNyHvFiFG_OM_=3-KArLQ@mail.gmail.com>
@ 2014-09-30  7:43     ` Miklos Szeredi
  2014-09-30 18:22     ` Linus Torvalds
  2014-09-30 19:04     ` Linus Torvalds
  2 siblings, 0 replies; 33+ messages in thread
From: Miklos Szeredi @ 2014-09-30  7:43 UTC (permalink / raw)
  To: Anand Avati
  Cc: Linus Torvalds, Maxim Patlasov, open list:FUSE: FILESYSTEM...,
	Linux Kernel Mailing List

On Tue, Sep 30, 2014 at 7:33 AM, Anand Avati <avati@gluster.org> wrote:

> In general that sounds reasonable. The problem (as described in the original
> thread, at http://sourceforge.net/p/fuse/mailman/message/29889055/) happens
> in the presence of dup(). Tools like dd (and others) call dup(), and a
> second file descriptor is created for the same filp. Every descriptor's
> close() results in a ->flush(). The final close() of course results in
> ->release() as well.
>
> Because dup() is silently handled in VFS (and a FUSE filesystem is
> completely unaware), from a FUSE filesystem's point of view this is how
> operations look:
>
> - OPEN
> - ... <I/O ops>
> - FLUSH
> - ... <I/O ops>
> - FLUSH
> - ... <many more FLUSHes and I/O>
> - FLUSH
> - RELEASE
>
> So, for a given open(), one release() is guaranteed. There can be one or
> more (arbitrary number) of flush()es. If one wants to implement semantics
> where a second concurrent open()er gets EBUSY, they have to pick between one
> of the two evils:
>
> - Assume the first FLUSH is the last FLUSH and release internal locks/leases
> and therefore break semantics
> - Wait for RELEASE and asynchronously release internal locks/leases and
> therefore result in spurious EBUSY (between last close() and ->release())

As I wrote, we may identify the last FLUSH, if there aren't any more
references to the file.  This is by far the most common case.   Most
obvious exceptions:

 - mmaped file: there won't be a FLUSH after the file is closed, but
there might well be I/O.

 - file is sent over UNIX domain socket.  There might or might not be
a FLUSH depending on whether the file was received by the other end or
not.  There's no I/O while the file is being sent.

 - close() racing with read()/write() in multithreaded app.  FLUSH
might get there first before the actual I/O.

In these cases the "last flush" flag will *not* be set.

So we can have several cases:

 - "last flush" is set: this is indeed the last flush
 - "last flush" is not set and this is not the last flush
 - "last flush" is not set and this is the last flush, but there are
still I/O coming before the release
 - "last flush" is not set and this is the last flush and no I/O
happens before the release

Would this work for you?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
       [not found]   ` <CAFboF2yhGyjk4e_CHQV5b2WvB-QhsWNyHvFiFG_OM_=3-KArLQ@mail.gmail.com>
  2014-09-30  7:43     ` Miklos Szeredi
@ 2014-09-30 18:22     ` Linus Torvalds
  2014-09-30 19:04     ` Linus Torvalds
  2 siblings, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2014-09-30 18:22 UTC (permalink / raw)
  To: Anand Avati
  Cc: Maxim Patlasov, Miklos Szeredi, open list:FUSE: FILESYSTEM...,
	Linux Kernel Mailing List

On Mon, Sep 29, 2014 at 10:33 PM, Anand Avati <avati@gluster.org> wrote:
>
> In general that sounds reasonable. The problem (as described in the original
> thread, at http://sourceforge.net/p/fuse/mailman/message/29889055/) happens
> in the presence of dup(). Tools like dd (and others) call dup(), and a
> second file descriptor is created for the same filp. Every descriptor's
> close() results in a ->flush(). The final close() of course results in
> ->release() as well.

The thing is, that final close() does not AT ALL "of course result in
->release() as well".

There are any number of things that can delat that release. Not just
mmap(), but simply things like some other process looking at the file
descriptor in /proc.

So I agree that flush can - and will - be called multiple times. So
what? That's the only thing you have if you want something
*synchronous*. Really.

fput() is not synchronous, and never will be. End of discussion. If
you think it is, you are wrong. If you try to make it so, you're
making things worse. Really. Really really.

Real filesystems handle this flush vs release difference well. If
yours does not, yours is broken. Understand that.

          Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
       [not found]   ` <CAFboF2yhGyjk4e_CHQV5b2WvB-QhsWNyHvFiFG_OM_=3-KArLQ@mail.gmail.com>
  2014-09-30  7:43     ` Miklos Szeredi
  2014-09-30 18:22     ` Linus Torvalds
@ 2014-09-30 19:04     ` Linus Torvalds
  2014-09-30 19:19       ` Miklos Szeredi
  2 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2014-09-30 19:04 UTC (permalink / raw)
  To: Anand Avati
  Cc: Maxim Patlasov, Miklos Szeredi, open list:FUSE: FILESYSTEM...,
	Linux Kernel Mailing List

On Mon, Sep 29, 2014 at 10:33 PM, Anand Avati <avati@gluster.org> wrote:
>
> In general that sounds reasonable. The problem (as described in the original
> thread, at http://sourceforge.net/p/fuse/mailman/message/29889055/)

>From a quick look at that thread, the solution is clear: you *must*
flush your write buffers in the "flush" function.

The fact that you must flush write buffers multiple times if people
have done "dup()" is a complete non-issue. Just flush them each time.
There is no "how do I differentiate the first flush and
the second flush?" The answer is that you don't, and that you MUST
NOT. You need to flush on both (or more). Trying to distinguish first
vs second is broken, and would be wrogn *anyway*. There is no possible
situation where it could validaly matter, and you simply cannot tell.

Don't do any data structure cleanups, that's for "release()". But yes,
you do have to flush write buffers at flush time (and return IO errors
if they happen). That's very much the point of flush.

               Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-09-30 19:04     ` Linus Torvalds
@ 2014-09-30 19:19       ` Miklos Szeredi
  2014-09-30 20:44         ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2014-09-30 19:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Anand Avati, Maxim Patlasov, open list:FUSE: FILESYSTEM...,
	Linux Kernel Mailing List

On Tue, Sep 30, 2014 at 12:04:08PM -0700, Linus Torvalds wrote:
> On Mon, Sep 29, 2014 at 10:33 PM, Anand Avati <avati@gluster.org> wrote:
> >
> > In general that sounds reasonable. The problem (as described in the original
> > thread, at http://sourceforge.net/p/fuse/mailman/message/29889055/)
> 
> From a quick look at that thread, the solution is clear: you *must*
> flush your write buffers in the "flush" function.
> 
> The fact that you must flush write buffers multiple times if people
> have done "dup()" is a complete non-issue. Just flush them each time.
> There is no "how do I differentiate the first flush and
> the second flush?" The answer is that you don't, and that you MUST
> NOT. You need to flush on both (or more). Trying to distinguish first
> vs second is broken, and would be wrogn *anyway*. There is no possible
> situation where it could validaly matter, and you simply cannot tell.
> 
> Don't do any data structure cleanups, that's for "release()". But yes,
> you do have to flush write buffers at flush time (and return IO errors
> if they happen). That's very much the point of flush.

What about flock(2), FL_SETLEASE, etc semantics (which are the sane ones,
compared to the POSIX locks shit which mandates release of lock on each close(2)
instead of "when all [duplicate] descriptors have been closed")?

You have to do that from ->release(), there's no question about that.  And while
I haven't looked at the wording of the standards, doing that synchronously with
the last close is a pretty decent thing to expect.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-09-30 19:19       ` Miklos Szeredi
@ 2014-09-30 20:44         ` Linus Torvalds
  2014-10-01  3:47           ` Miklos Szeredi
  2014-10-01 11:28           ` Maxim Patlasov
  0 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2014-09-30 20:44 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Anand Avati, Maxim Patlasov, open list:FUSE: FILESYSTEM...,
	Linux Kernel Mailing List

On Tue, Sep 30, 2014 at 12:19 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> What about flock(2), FL_SETLEASE, etc semantics (which are the sane ones,
> compared to the POSIX locks shit which mandates release of lock on each close(2)
> instead of "when all [duplicate] descriptors have been closed")?
>
> You have to do that from ->release(), there's no question about that.

We do locks_remove_file() independently on ->release, but yes, it's
basically done just before the last release.

But it has the *exact* same semantics as release, including very much
having nothing what-so-ever to do with "last close()".

If the file descriptor is opened for other reasons (ie mmap, /proc
accesses, whatever), then that delays locks_remove_file() the same way
it delays release.

None of that has *anothing* to do with "synchronous". Thinking it does is wrong.

And none of this has *anything* to do with the issue that Maxim
pointed to in the mailing list web page, which was about write caches,
and how you cannot (and MUST NOT) delay them until release time.

            Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-09-30 20:44         ` Linus Torvalds
@ 2014-10-01  3:47           ` Miklos Szeredi
  2014-10-01 11:28           ` Maxim Patlasov
  1 sibling, 0 replies; 33+ messages in thread
From: Miklos Szeredi @ 2014-10-01  3:47 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Anand Avati, Maxim Patlasov, open list:FUSE: FILESYSTEM...,
	Linux Kernel Mailing List

On Tue, Sep 30, 2014 at 10:44 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Sep 30, 2014 at 12:19 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>>
>> What about flock(2), FL_SETLEASE, etc semantics (which are the sane ones,
>> compared to the POSIX locks shit which mandates release of lock on each close(2)
>> instead of "when all [duplicate] descriptors have been closed")?
>>
>> You have to do that from ->release(), there's no question about that.
>
> We do locks_remove_file() independently on ->release, but yes, it's
> basically done just before the last release.
>
> But it has the *exact* same semantics as release, including very much
> having nothing what-so-ever to do with "last close()".
>
> If the file descriptor is opened for other reasons (ie mmap, /proc
> accesses, whatever), then that delays locks_remove_file() the same way
> it delays release.
>
> None of that has *anothing* to do with "synchronous". Thinking it does is wrong.

Oh, and btw. nfsv4 has a synchronous ->release implementation.
Probably for exactly the same reason Anand and Maxim wants it in their
server.  Note: locks_remove_file() works for local locking, but
distributed filesystems need to release locks and leases in their
->release.

And yes, /proc does temporary ref of file.  But it's an implementation
detail, and should be fixable.  As to mmap, it's almost like dup(2)
regarding the file, we could actually handle it that way and have a
flush at munmap time too.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-09-30 20:44         ` Linus Torvalds
  2014-10-01  3:47           ` Miklos Szeredi
@ 2014-10-01 11:28           ` Maxim Patlasov
  2014-10-09  8:14             ` Miklos Szeredi
  1 sibling, 1 reply; 33+ messages in thread
From: Maxim Patlasov @ 2014-10-01 11:28 UTC (permalink / raw)
  To: Linus Torvalds, Miklos Szeredi
  Cc: Anand Avati, open list:FUSE: FILESYSTEM...,
	Linux Kernel Mailing List, mtheall

On 10/01/2014 12:44 AM, Linus Torvalds wrote:
> On Tue, Sep 30, 2014 at 12:19 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> What about flock(2), FL_SETLEASE, etc semantics (which are the sane ones,
>> compared to the POSIX locks shit which mandates release of lock on each close(2)
>> instead of "when all [duplicate] descriptors have been closed")?
>>
>> You have to do that from ->release(), there's no question about that.
> We do locks_remove_file() independently on ->release, but yes, it's
> basically done just before the last release.
>
> But it has the *exact* same semantics as release, including very much
> having nothing what-so-ever to do with "last close()".
>
> If the file descriptor is opened for other reasons (ie mmap, /proc
> accesses, whatever), then that delays locks_remove_file() the same way
> it delays release.
>
> None of that has *anothing* to do with "synchronous". Thinking it does is wrong.
>
> And none of this has *anything* to do with the issue that Maxim
> pointed to in the mailing list web page, which was about write caches,
> and how you cannot (and MUST NOT) delay them until release time.

I apologise for mentioning that mailing list web page in my title 
message. This was really misleading, I had to think about it in advance. 
Of course, write caches must be flushed in scope of ->flush(), not 
->release(). Let me please set forth an use-case that led me to those 
patches.

We implemented a FUSE-based distributed storage solution intended for 
keeping images of VMs (virtual machines) and their configuration files. 
The way how VMs use images makes exclusive-open()er semantics very 
attractive: while a VM is using its image on a node, the concurrent 
access from other nodes to that image is neither desirable nor 
necessary. So,  we acquire an exclusive lease on FUSE_OPEN and release 
it on FUSE_RELEASE. This is quite natural and has obviously nothing to 
do with FUSE_FLUSH.

Following such semantics, there are two choices for handling open() if 
the file is currently exclusively locked by a remote node: (a) return 
EBUSY; (b) block until the remote node release the file. We decided for 
(a), because (b) is very inconvenient in practice: most applications 
handle failed open(2) properly, but very few are clever enough to spawn 
a separate thread with open() and kill it if the open() has not 
succeeded in a reasonable time.

The patches I sent make essentially one thing: they make FUSE 
->release() wait for ACK from userspace before return. Without these 
patches, any attempt to test or use our storage in valid use-cases led 
to spurious EBUSY. For example, while migrating a VM from one node to 
another, we firstly close the image file on source node, then try to 
open it on destination node, but fail because FUSE_RELEASE is not 
processed by userspace on source node yet.

Given those patches must die, do you have any ideas how to resolve that 
"spurious EBUSY" problem?

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-01 11:28           ` Maxim Patlasov
@ 2014-10-09  8:14             ` Miklos Szeredi
  2014-10-16 10:31               ` Maxim Patlasov
  0 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2014-10-09  8:14 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: Linus Torvalds, Anand Avati, Linux Kernel Mailing List,
	Michael j Theall, fuse-devel

On Wed, Oct 1, 2014 at 1:28 PM, Maxim Patlasov <mpatlasov@parallels.com> wrote:
> Given those patches must die, do you have any ideas how to resolve that
> "spurious EBUSY" problem?

Check the "sync_release" branch of fuse:

  git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git sync_release

And same branch name for libfuse:

  git://git.code.sf.net/p/fuse/fuse sync_release

What it does is send RELEASE from ->flush() after checking the
refcount of file (being careful about RCU accesses).

Lightly tested, more testing, as well as review, is welcome.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-09  8:14             ` Miklos Szeredi
@ 2014-10-16 10:31               ` Maxim Patlasov
  2014-10-16 13:43                 ` Miklos Szeredi
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Patlasov @ 2014-10-16 10:31 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Anand Avati, Linux Kernel Mailing List,
	Michael j Theall, fuse-devel

Hi Miklos,

On 10/09/2014 12:14 PM, Miklos Szeredi wrote:
> On Wed, Oct 1, 2014 at 1:28 PM, Maxim Patlasov <mpatlasov@parallels.com> wrote:
>> Given those patches must die, do you have any ideas how to resolve that
>> "spurious EBUSY" problem?
> Check the "sync_release" branch of fuse:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse.git sync_release
>
> And same branch name for libfuse:
>
>    git://git.code.sf.net/p/fuse/fuse sync_release
>
> What it does is send RELEASE from ->flush() after checking the
> refcount of file (being careful about RCU accesses).
>
> Lightly tested, more testing, as well as review, is welcome.

Thank you very much for efforts, highly appreciated! I've had a close 
look at your patches and found a few issues. Most of them can be easily 
fixed, but one puzzles me: the way how you detect last flush is not race 
free. Something as simple as:

int main(int argc, char *argv[])
{
     int fd = open(argv[1], O_RDWR);
     fork();
}

may easily dive into fuse_try_sync_release() concurrently and both 
observe file->f_count == 2. Then both return falling back to sending the 
release asynchronously. This makes sync/async behaviour unpredictable 
even for well-behaved applications which don't do any esoteric things 
like racing i/o with close or exiting while a descriptor is in-flight in 
a unix domain socket.

I cannot see any way to recognise last flush without help of VFS layer, 
can you?

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-16 10:31               ` Maxim Patlasov
@ 2014-10-16 13:43                 ` Miklos Szeredi
  2014-10-16 13:54                   ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2014-10-16 13:43 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: Linus Torvalds, Anand Avati, Linux Kernel Mailing List,
	Michael j Theall, fuse-devel

On Thu, Oct 16, 2014 at 12:31 PM, Maxim Patlasov
<mpatlasov@parallels.com> wrote:

> Something as simple as:
>
> int main(int argc, char *argv[])
> {
>     int fd = open(argv[1], O_RDWR);
>     fork();
> }
>
> may easily dive into fuse_try_sync_release() concurrently and both observe
> file->f_count == 2. Then both return falling back to sending the release
> asynchronously. This makes sync/async behaviour unpredictable even for
> well-behaved applications which don't do any esoteric things like racing i/o
> with close or exiting while a descriptor is in-flight in a unix domain
> socket.
>
> I cannot see any way to recognise last flush without help of VFS layer, can
> you?

No.

One idea is to change ->flush() so it's responsible for fput()-ing the
file.  That way we could take control of the actual refcount
decrement.  There are only 20 flush instances in the tree, so it
wouldn't be a huge change.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-16 13:43                 ` Miklos Szeredi
@ 2014-10-16 13:54                   ` Linus Torvalds
  2014-10-17  8:55                     ` Miklos Szeredi
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2014-10-16 13:54 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Maxim Patlasov, Anand Avati, Linux Kernel Mailing List,
	Michael j Theall, fuse-devel

On Thu, Oct 16, 2014 at 3:43 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> One idea is to change ->flush() so it's responsible for fput()-ing the
> file.  That way we could take control of the actual refcount
> decrement.  There are only 20 flush instances in the tree, so it
> wouldn't be a huge change.

Since that *still* wouldn't fix the problem with the whole "count
elevated by other things" issue, I really don't want to hear about
these random broken hacks that are fundamentally broken crap.

Really. Stop cc'ing me with "let's implement this hack that cannot
work in general". I'm not interested. There's a reason we don't do
this. We don't make up random hacks that we know cannot work in the
general case.

            Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-16 13:54                   ` Linus Torvalds
@ 2014-10-17  8:55                     ` Miklos Szeredi
  2014-10-18 15:35                       ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Miklos Szeredi @ 2014-10-17  8:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maxim Patlasov, Anand Avati, Linux Kernel Mailing List,
	Michael j Theall, fuse-devel

On Thu, Oct 16, 2014 at 03:54:42PM +0200, Linus Torvalds wrote:
> On Thu, Oct 16, 2014 at 3:43 PM, Miklos Szeredi <miklos@szeredi.hu> wrote:
> >
> > One idea is to change ->flush() so it's responsible for fput()-ing the
> > file.  That way we could take control of the actual refcount
> > decrement.  There are only 20 flush instances in the tree, so it
> > wouldn't be a huge change.
> 
> Since that *still* wouldn't fix the problem with the whole "count
> elevated by other things" issue, I really don't want to hear about
> these random broken hacks that are fundamentally broken crap.

The problem with those "count elevated by other things" is that they are
actually bugs.  Take the following test case (this is not made up, I really got
bug reports agains something like this).

mount("/dev/foo", "/mnt");
fd = creat("/mnt/bar");
close(fd);
umount("/mnt")

If that umount fails with EBUSY, that's a bug, since we've released the only
known resource we've opened on that mount.

Now, if some completely unrelated "lsof" instance goes fishing in /proc, then
that will be able to hold that release up, making the test fail.

And it's actually trivially fixable.  See attached patch.

Thanks,
Miklos

diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index e11d7c590bb0..f8a67dafb04f 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -19,6 +19,8 @@ static int seq_show(struct seq_file *m, void *v)
 {
 	struct files_struct *files = NULL;
 	int f_flags = 0, ret = -ENOENT;
+	loff_t f_pos;
+	int mnt_id;
 	struct file *file = NULL;
 	struct task_struct *task;
 
@@ -41,7 +43,13 @@ static int seq_show(struct seq_file *m, void *v)
 			if (close_on_exec(fd, fdt))
 				f_flags |= O_CLOEXEC;
 
-			get_file(file);
+			f_pos = file->f_pos;
+			mnt_id = real_mount(file->f_path.mnt)->mnt_id;
+
+			if (file->f_op->show_fdinfo)
+				get_file(file);
+			else
+				file = NULL;
 			ret = 0;
 		}
 		spin_unlock(&files->file_lock);
@@ -50,11 +58,11 @@ static int seq_show(struct seq_file *m, void *v)
 
 	if (!ret) {
 		seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\n",
-			   (long long)file->f_pos, f_flags,
-			   real_mount(file->f_path.mnt)->mnt_id);
-		if (file->f_op->show_fdinfo)
+			   (long long) f_pos, f_flags, mnt_id);
+		if (file) {
 			ret = file->f_op->show_fdinfo(m, file);
-		fput(file);
+			fput(file);
+		}
 	}
 
 	return ret;

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-17  8:55                     ` Miklos Szeredi
@ 2014-10-18 15:35                       ` Linus Torvalds
  2014-10-18 15:40                         ` Linus Torvalds
  0 siblings, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2014-10-18 15:35 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Maxim Patlasov, Anand Avati, Linux Kernel Mailing List,
	Michael j Theall, fuse-devel

On Fri, Oct 17, 2014 at 1:55 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> The problem with those "count elevated by other things" is that they are
> actually bugs.

No they aren't. You think they are, and then you find one case, and
ignore all the others.

Look around for AIO. Look around for the loop driver. Look around for
a number of things that do "fget()" and that you completely ignored.

So no, your patch doesn't change the fundamental issue in any way,
shape, or form.

I asked you to stop emailing me with these broken "let's fix one
special case, because I can't be bothered to understand the big
picture" patches. This was another one of those hacky sh*t patches
that doesn't actually change the deeper issue. Stop it. Seriously.
This idiotic thread has been going on for too long.

              Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-18 15:35                       ` Linus Torvalds
@ 2014-10-18 15:40                         ` Linus Torvalds
  2014-10-18 18:01                           ` Miklos Szeredi
  2014-10-18 18:22                           ` Al Viro
  0 siblings, 2 replies; 33+ messages in thread
From: Linus Torvalds @ 2014-10-18 15:40 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Maxim Patlasov, Anand Avati, Linux Kernel Mailing List,
	Michael j Theall, fuse-devel

On Sat, Oct 18, 2014 at 8:35 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Look around for AIO. Look around for the loop driver. Look around for
> a number of things that do "fget()" and that you completely ignored.

.. actually, there are more instances of "get_file()" than of
"fget()", the aio one just happened to be the latter form. Lots and
lots of ways to get ahold of a file descriptor that keeps it open past
the "last close".

               Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-18 15:40                         ` Linus Torvalds
@ 2014-10-18 18:01                           ` Miklos Szeredi
  2014-10-18 18:24                             ` Al Viro
  2014-10-18 18:38                             ` Linus Torvalds
  2014-10-18 18:22                           ` Al Viro
  1 sibling, 2 replies; 33+ messages in thread
From: Miklos Szeredi @ 2014-10-18 18:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Maxim Patlasov, Anand Avati, Linux Kernel Mailing List,
	Michael j Theall, fuse-devel

On Sat, Oct 18, 2014 at 5:40 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Sat, Oct 18, 2014 at 8:35 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> Look around for AIO. Look around for the loop driver. Look around for
>> a number of things that do "fget()" and that you completely ignored.
>
> .. actually, there are more instances of "get_file()" than of
> "fget()", the aio one just happened to be the latter form. Lots and
> lots of ways to get ahold of a file descriptor that keeps it open past
> the "last close".

And what you don't get is that there's a deep difference between those
and the /proc file access case.

And the difference is that one is done because of an explicit action
by the holder of the open file.  And the other is done by some random
process doing non-invasive examination of the holder of the open-file.

So basically: we simply don't care if last close does not happen to
release the file *iff* it was because of some explicit action that
obviously has or could have such a side effect.  Is that so hard to
understand?

In other words, we care about doing that last release synchronously if
it provably is the last release of that  file and happens to be done
from close() (or munmap()).  And then all your examples of loop driver
and aio are pointless, because we *know* they will be holding onto
that descriptor, the same as we know, that after dup(), close() will
not release the file and the (non-IDIOTIX) locks together with the
file.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-18 15:40                         ` Linus Torvalds
  2014-10-18 18:01                           ` Miklos Szeredi
@ 2014-10-18 18:22                           ` Al Viro
  2014-10-18 22:44                             ` Eric W. Biederman
  1 sibling, 1 reply; 33+ messages in thread
From: Al Viro @ 2014-10-18 18:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Miklos Szeredi, Maxim Patlasov, Anand Avati,
	Linux Kernel Mailing List, Michael j Theall, fuse-devel,
	linux-fsdevel

On Sat, Oct 18, 2014 at 08:40:05AM -0700, Linus Torvalds wrote:
> On Sat, Oct 18, 2014 at 8:35 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Look around for AIO. Look around for the loop driver. Look around for
> > a number of things that do "fget()" and that you completely ignored.
> 
> .. actually, there are more instances of "get_file()" than of
> "fget()", the aio one just happened to be the latter form. Lots and
> lots of ways to get ahold of a file descriptor that keeps it open past
> the "last close".

FWIW, procfs patch touches a very annoying issue: ->show_fdinfo() being
blocking.  I would really like to get rid of that particular get_file()
and even more so - of get_files_struct() in there.

I certainly agree that anyone who expects that close() means the end of IO
is completely misguided.  Mappings don't disappear on close(), neither does
a descriptor returned by dup(), or one that child got over fork(),
or something sent over in SCM_RIGHTS datagram, or, as you suggested, made
backing store for /dev/loop, etc.

What's more, in the example given upthread, somebody might've spotted that
file in /proc/<pid>/fd/* and *opened* it.  At which point umount would
have to fail with EBUSY.  And the same lsof(8) might've done just that.

It's not a matter of correctness or security, especially since somebody who
could do that, could've stopped your process, PTRACE_POKEd a fairly short
series of syscalls that would connect to AF_UNIX socket, send the file
over to them and clean after itself, then single-stepped through all of that,
restored the original state and resumed your process.  

It is a QoI matter, though.  And get_files_struct() in there is a lot more
annoying than get_file()/fput().  Suppose you catch the process during
exit().  All of a sudden, read from /proc/<pid>/fdinfo/<n> ends up doing
shitloads of filp_close().  It would be nice to avoid that.

Folks, how much pain would it be to make ->show_fdinfo() non-blocking?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-18 18:01                           ` Miklos Szeredi
@ 2014-10-18 18:24                             ` Al Viro
  2014-10-18 18:45                               ` Al Viro
  2014-10-18 18:38                             ` Linus Torvalds
  1 sibling, 1 reply; 33+ messages in thread
From: Al Viro @ 2014-10-18 18:24 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Maxim Patlasov, Anand Avati,
	Linux Kernel Mailing List, Michael j Theall, fuse-devel

On Sat, Oct 18, 2014 at 08:01:13PM +0200, Miklos Szeredi wrote:

> And what you don't get is that there's a deep difference between those
> and the /proc file access case.
> 
> And the difference is that one is done because of an explicit action
> by the holder of the open file.  And the other is done by some random
> process doing non-invasive examination of the holder of the open-file.

Such as ls -l /proc/*/fd/*?  That will give you the same transient EBUSY
from umount, if it hits the right moment...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-18 18:01                           ` Miklos Szeredi
  2014-10-18 18:24                             ` Al Viro
@ 2014-10-18 18:38                             ` Linus Torvalds
  1 sibling, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2014-10-18 18:38 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Maxim Patlasov, Anand Avati, Linux Kernel Mailing List,
	Michael j Theall, fuse-devel

On Sat, Oct 18, 2014 at 11:01 AM, Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> And what you don't get is that there's a deep difference between those
> and the /proc file access case.

No there isn't.

Your "action by the holder" argument is pure and utter garbage, for a
very simple and core reason: the *filesystem* doesn't know or care. So
from a fuse standpoint, the difference is totally and entirely
irrelevant.

Your "synchronous fput" fails. It fails totally regardless of that
"who incremented the file user count" issue. Face it, your patch is
broken. And it's *fundamentally* broken, which is why I'm so tired of
your stupid ad-hoc hacks that cannot possibly work.

Your umount EBUSY case is somewhat relevant to the "who holds the file
open", but quite frankly, that's not a filesystem issue, it's more of
a system management issue. So umount might get EBUSY. That's not
something the filesystem should/could care about, that's a MIS issue
that is entirely irrelevant, and the answer is "if somebody does lsof,
that might keep the filesystem busy". Tough.

             Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-18 18:24                             ` Al Viro
@ 2014-10-18 18:45                               ` Al Viro
  0 siblings, 0 replies; 33+ messages in thread
From: Al Viro @ 2014-10-18 18:45 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: Linus Torvalds, Maxim Patlasov, Anand Avati,
	Linux Kernel Mailing List, Michael j Theall, fuse-devel

On Sat, Oct 18, 2014 at 07:24:54PM +0100, Al Viro wrote:
> On Sat, Oct 18, 2014 at 08:01:13PM +0200, Miklos Szeredi wrote:
> 
> > And what you don't get is that there's a deep difference between those
> > and the /proc file access case.
> > 
> > And the difference is that one is done because of an explicit action
> > by the holder of the open file.  And the other is done by some random
> > process doing non-invasive examination of the holder of the open-file.
> 
> Such as ls -l /proc/*/fd/*?  That will give you the same transient EBUSY
> from umount, if it hits the right moment...

stat -L /proc/*/fd/*, that is...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/5] fuse: handle release synchronously (v4)
  2014-10-18 18:22                           ` Al Viro
@ 2014-10-18 22:44                             ` Eric W. Biederman
  0 siblings, 0 replies; 33+ messages in thread
From: Eric W. Biederman @ 2014-10-18 22:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Miklos Szeredi, Maxim Patlasov, Anand Avati,
	Linux Kernel Mailing List, Michael j Theall, fuse-devel,
	linux-fsdevel

Al Viro <viro@ZenIV.linux.org.uk> writes:

> On Sat, Oct 18, 2014 at 08:40:05AM -0700, Linus Torvalds wrote:
>> On Sat, Oct 18, 2014 at 8:35 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>> >
>> > Look around for AIO. Look around for the loop driver. Look around for
>> > a number of things that do "fget()" and that you completely ignored.
>> 
>> .. actually, there are more instances of "get_file()" than of
>> "fget()", the aio one just happened to be the latter form. Lots and
>> lots of ways to get ahold of a file descriptor that keeps it open past
>> the "last close".
>
> FWIW, procfs patch touches a very annoying issue: ->show_fdinfo() being
> blocking.  I would really like to get rid of that particular get_file()
> and even more so - of get_files_struct() in there.
>
> I certainly agree that anyone who expects that close() means the end of IO
> is completely misguided.  Mappings don't disappear on close(), neither does
> a descriptor returned by dup(), or one that child got over fork(),
> or something sent over in SCM_RIGHTS datagram, or, as you suggested, made
> backing store for /dev/loop, etc.
>
> What's more, in the example given upthread, somebody might've spotted that
> file in /proc/<pid>/fd/* and *opened* it.  At which point umount would
> have to fail with EBUSY.  And the same lsof(8) might've done just that.
>
> It's not a matter of correctness or security, especially since somebody who
> could do that, could've stopped your process, PTRACE_POKEd a fairly short
> series of syscalls that would connect to AF_UNIX socket, send the file
> over to them and clean after itself, then single-stepped through all of that,
> restored the original state and resumed your process.  
>
> It is a QoI matter, though.  And get_files_struct() in there is a lot more
> annoying than get_file()/fput().  Suppose you catch the process during
> exit().  All of a sudden, read from /proc/<pid>/fdinfo/<n> ends up doing
> shitloads of filp_close().  It would be nice to avoid that.
>
> Folks, how much pain would it be to make ->show_fdinfo() non-blocking?

I took a quick look and there are a couple of instances in tun,
eventpoll, and fanotify/inotify that take a spinlock while traversing
the data that needs to be printed.

So it would take a good hard stare at those pieces of code to understand
the locking, and potentially rewrite those routines.

The only one I am particularly familiar with tun did not look
fundamentally hard to change but it also isn't something I would
casually do either, as it would be easy to introduce nasty races by
accident.

Eric

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 3/5] fuse: wait for end of IO on release
  2014-06-06 13:27 [PATCH 0/5] fuse: close file synchronously (v2) Maxim Patlasov
@ 2014-06-06 13:29 ` Maxim Patlasov
  0 siblings, 0 replies; 33+ messages in thread
From: Maxim Patlasov @ 2014-06-06 13:29 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, linux-kernel

There are two types of I/O activity that can be "in progress" at the time
of fuse_release() execution: asynchronous read-ahead and write-back. The
patch ensures that they are completed before fuse_release_common sends
FUSE_RELEASE to userspace.

So far as fuse_release() waits for end of async I/O, its callbacks
(fuse_readpages_end and fuse_writepage_finish) calling fuse_file_put cannot
be the last holders of fuse file anymore. To emphasize the fact, the patch
replaces fuse_file_put with __fuse_file_put there.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b81a945..d50af99 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -149,6 +149,17 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
 	}
 }
 
+/*
+ * Asynchronous callbacks may use it instead of fuse_file_put() because
+ * we guarantee that they are never last holders of ff. Hitting BUG() below
+ * will make clear any violation of the guarantee.
+ */
+static void __fuse_file_put(struct fuse_file *ff)
+{
+	if (atomic_dec_and_test(&ff->count))
+		BUG();
+}
+
 int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
 		 bool isdir)
 {
@@ -302,6 +313,13 @@ void fuse_release_common(struct file *file, int opcode)
 	req->misc.release.path = file->f_path;
 
 	/*
+	 * No more in-flight asynchronous READ or WRITE requests if
+	 * fuse file release is synchronous
+	 */
+	if (ff->fc->close_wait)
+		BUG_ON(atomic_read(&ff->count) != 1);
+
+	/*
 	 * Normally this will send the RELEASE request, however if
 	 * some asynchronous READ or WRITE requests are outstanding,
 	 * the sending will be delayed.
@@ -321,11 +339,34 @@ static int fuse_open(struct inode *inode, struct file *file)
 static int fuse_release(struct inode *inode, struct file *file)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_file *ff = file->private_data;
 
 	/* see fuse_vma_close() for !writeback_cache case */
 	if (fc->writeback_cache)
 		write_inode_now(inode, 1);
 
+	if (ff->fc->close_wait) {
+		struct fuse_inode *fi = get_fuse_inode(inode);
+
+		/*
+		 * Must remove file from write list. Otherwise it is possible
+		 * this file will get more writeback from another files
+		 * rerouted via write_files.
+		 */
+		spin_lock(&ff->fc->lock);
+		list_del_init(&ff->write_entry);
+		spin_unlock(&ff->fc->lock);
+
+		wait_event(fi->page_waitq, atomic_read(&ff->count) == 1);
+
+		/*
+		 * spin_unlock_wait(&ff->fc->lock) would be natural here to
+		 * wait for threads just released ff to leave their critical
+		 * sections. But taking spinlock is the first thing
+		 * fuse_release_common does, so that this is unnecessary.
+		 */
+	}
+
 	fuse_release_common(file, FUSE_RELEASE);
 
 	/* return value is ignored by VFS */
@@ -823,8 +864,17 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
 		unlock_page(page);
 		page_cache_release(page);
 	}
-	if (req->ff)
-		fuse_file_put(req->ff, false);
+	if (req->ff) {
+		if (fc->close_wait) {
+			struct fuse_inode *fi = get_fuse_inode(req->inode);
+
+			spin_lock(&fc->lock);
+			__fuse_file_put(req->ff);
+			wake_up(&fi->page_waitq);
+			spin_unlock(&fc->lock);
+		} else
+			fuse_file_put(req->ff, false);
+	}
 }
 
 struct fuse_fill_data {
@@ -851,6 +901,7 @@ static void fuse_send_readpages(struct fuse_fill_data *data)
 	if (fc->async_read) {
 		req->ff = fuse_file_get(ff);
 		req->end = fuse_readpages_end;
+		req->inode = data->inode;
 		fuse_request_send_background(fc, req);
 	} else {
 		fuse_request_send(fc, req);
@@ -1537,7 +1588,7 @@ static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req)
 	for (i = 0; i < req->num_pages; i++)
 		__free_page(req->pages[i]);
 
-	if (req->ff)
+	if (req->ff && !fc->close_wait)
 		fuse_file_put(req->ff, false);
 }
 
@@ -1554,6 +1605,8 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
 		dec_zone_page_state(req->pages[i], NR_WRITEBACK_TEMP);
 		bdi_writeout_inc(bdi);
 	}
+	if (fc->close_wait)
+		__fuse_file_put(req->ff);
 	wake_up(&fi->page_waitq);
 }
 
@@ -1694,8 +1747,16 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
 
 	ff = __fuse_write_file_get(fc, fi);
 	err = fuse_flush_times(inode, ff);
-	if (ff)
-		fuse_file_put(ff, 0);
+	if (ff) {
+		if (fc->close_wait) {
+			spin_lock(&fc->lock);
+			__fuse_file_put(ff);
+			wake_up(&fi->page_waitq);
+			spin_unlock(&fc->lock);
+
+		} else
+			fuse_file_put(ff, false);
+	}
 
 	return err;
 }


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/5] fuse: wait for end of IO on release
  2013-01-02 20:35   ` Brian Foster
@ 2013-01-15 14:04     ` Maxim V. Patlasov
  0 siblings, 0 replies; 33+ messages in thread
From: Maxim V. Patlasov @ 2013-01-15 14:04 UTC (permalink / raw)
  To: Brian Foster
  Cc: miklos, dev, xemul, fuse-devel, linux-kernel, devel, anand.avati

Hi Brian,

01/03/2013 12:35 AM, Brian Foster пишет:
> On 12/20/2012 07:31 AM, Maxim Patlasov wrote:
>> There are two types of I/O activity that can be "in progress" at the time
>> of fuse_release() execution: asynchronous read-ahead and write-back. The
>> patch ensures that they are completed before fuse_release_common sends
>> FUSE_RELEASE to userspace.
>>
>> So far as fuse_release() waits for end of async I/O, its callbacks
>> (fuse_readpages_end and fuse_writepage_finish) calling fuse_file_put cannot
>> be the last holders of fuse file anymore. To emphasize the fact, the patch
>> replaces fuse_file_put with __fuse_file_put there.
>>
>> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
>> ---
>>   fs/fuse/file.c |   55 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>>   1 files changed, 52 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index 4f23134..aed9be2 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -137,6 +137,12 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
>>   	}
>>   }
>>   
>> +static void __fuse_file_put(struct fuse_file *ff)
>> +{
>> +	if (atomic_dec_and_test(&ff->count))
>> +		BUG();
>> +}
>> +
> I think a comment in or before this function to explain the reasoning
> for the BUG would be helpful.
>
>>   int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
>>   		 bool isdir)
>>   {
>> @@ -260,7 +266,12 @@ void fuse_release_common(struct file *file, int opcode)
>>   	 * Make the release synchronous if this is a fuseblk mount,
>>   	 * synchronous RELEASE is allowed (and desirable) in this case
>>   	 * because the server can be trusted not to screw up.
>> +	 *
>> +	 * We might wait for them (asynchronous READ or WRITE requests), so:
>>   	 */
>> +	if (ff->fc->close_wait)
>> +		BUG_ON(atomic_read(&ff->count) != 1);
>> +
> It might be cleaner to pull the new part of the comment and the BUG_ON()
> check to before the existing comment and fuse_file_put (e.g., create a
> new comment).
>
>>   	fuse_file_put(ff, ff->fc->destroy_req != NULL);
>>   }
>>   
>> @@ -271,6 +282,31 @@ static int fuse_open(struct inode *inode, struct file *file)
>>   
>>   static int fuse_release(struct inode *inode, struct file *file)
>>   {
>> +	struct fuse_file *ff = file->private_data;
>> +
>> +	if (ff->fc->close_wait) {
>> +		struct fuse_inode *fi = get_fuse_inode(inode);
>> +
>> +		/*
>> +		 * Must remove file from write list. Otherwise it is possible
>> +		 * this file will get more writeback from another files
>> +		 * rerouted via write_files.
>> +		 */
>> +		spin_lock(&ff->fc->lock);
>> +		list_del_init(&ff->write_entry);
>> +		spin_unlock(&ff->fc->lock);
>> +
>> +		wait_event(fi->page_waitq, atomic_read(&ff->count) == 1);
>> +
>> +		/*
>> +		 * Wait for threads just released ff to leave their critical
>> +		 * sections. Taking spinlock is the first thing
>> +		 * fuse_release_common does, so that this is unnecessary, but
>> +		 * it is still good to emphasize right here, that we need this.
>> +		 */
>> +		spin_unlock_wait(&ff->fc->lock);
> I'm all for clarity, but if the wait is unnecessary, perhaps just leave
> the comment..? Just my .02.
>
> Aside from the few nits here, the set looks pretty good to me.

Thanks for review, the suggestions look reasonable. I'll resend 
corrected patch soon.

Maxim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/5] fuse: wait for end of IO on release
  2012-12-20 12:31 ` [PATCH 3/5] fuse: wait for end of IO on release Maxim Patlasov
@ 2013-01-02 20:35   ` Brian Foster
  2013-01-15 14:04     ` Maxim V. Patlasov
  0 siblings, 1 reply; 33+ messages in thread
From: Brian Foster @ 2013-01-02 20:35 UTC (permalink / raw)
  To: Maxim Patlasov
  Cc: miklos, dev, xemul, fuse-devel, linux-kernel, devel, anand.avati

On 12/20/2012 07:31 AM, Maxim Patlasov wrote:
> There are two types of I/O activity that can be "in progress" at the time
> of fuse_release() execution: asynchronous read-ahead and write-back. The
> patch ensures that they are completed before fuse_release_common sends
> FUSE_RELEASE to userspace.
> 
> So far as fuse_release() waits for end of async I/O, its callbacks
> (fuse_readpages_end and fuse_writepage_finish) calling fuse_file_put cannot
> be the last holders of fuse file anymore. To emphasize the fact, the patch
> replaces fuse_file_put with __fuse_file_put there.
> 
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
>  fs/fuse/file.c |   55 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
>  1 files changed, 52 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 4f23134..aed9be2 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -137,6 +137,12 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
>  	}
>  }
>  
> +static void __fuse_file_put(struct fuse_file *ff)
> +{
> +	if (atomic_dec_and_test(&ff->count))
> +		BUG();
> +}
> +

I think a comment in or before this function to explain the reasoning
for the BUG would be helpful.

>  int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
>  		 bool isdir)
>  {
> @@ -260,7 +266,12 @@ void fuse_release_common(struct file *file, int opcode)
>  	 * Make the release synchronous if this is a fuseblk mount,
>  	 * synchronous RELEASE is allowed (and desirable) in this case
>  	 * because the server can be trusted not to screw up.
> +	 *
> +	 * We might wait for them (asynchronous READ or WRITE requests), so:
>  	 */
> +	if (ff->fc->close_wait)
> +		BUG_ON(atomic_read(&ff->count) != 1);
> +

It might be cleaner to pull the new part of the comment and the BUG_ON()
check to before the existing comment and fuse_file_put (e.g., create a
new comment).

>  	fuse_file_put(ff, ff->fc->destroy_req != NULL);
>  }
>  
> @@ -271,6 +282,31 @@ static int fuse_open(struct inode *inode, struct file *file)
>  
>  static int fuse_release(struct inode *inode, struct file *file)
>  {
> +	struct fuse_file *ff = file->private_data;
> +
> +	if (ff->fc->close_wait) {
> +		struct fuse_inode *fi = get_fuse_inode(inode);
> +
> +		/*
> +		 * Must remove file from write list. Otherwise it is possible
> +		 * this file will get more writeback from another files
> +		 * rerouted via write_files.
> +		 */
> +		spin_lock(&ff->fc->lock);
> +		list_del_init(&ff->write_entry);
> +		spin_unlock(&ff->fc->lock);
> +
> +		wait_event(fi->page_waitq, atomic_read(&ff->count) == 1);
> +
> +		/*
> +		 * Wait for threads just released ff to leave their critical
> +		 * sections. Taking spinlock is the first thing
> +		 * fuse_release_common does, so that this is unnecessary, but
> +		 * it is still good to emphasize right here, that we need this.
> +		 */
> +		spin_unlock_wait(&ff->fc->lock);

I'm all for clarity, but if the wait is unnecessary, perhaps just leave
the comment..? Just my .02.

Aside from the few nits here, the set looks pretty good to me.

Brian

> +	}
> +
>  	fuse_release_common(file, FUSE_RELEASE);
>  
>  	/* return value is ignored by VFS */
> @@ -610,8 +646,17 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
>  		unlock_page(page);
>  		page_cache_release(page);
>  	}
> -	if (req->ff)
> -		fuse_file_put(req->ff, false);
> +	if (req->ff) {
> +		if (fc->close_wait) {
> +			struct fuse_inode *fi = get_fuse_inode(req->inode);
> +
> +			spin_lock(&fc->lock);
> +			__fuse_file_put(req->ff);
> +			wake_up(&fi->page_waitq);
> +			spin_unlock(&fc->lock);
> +		} else
> +			fuse_file_put(req->ff, false);
> +	}
>  }
>  
>  struct fuse_fill_data {
> @@ -637,6 +682,7 @@ static void fuse_send_readpages(struct fuse_fill_data *data)
>  	if (fc->async_read) {
>  		req->ff = fuse_file_get(ff);
>  		req->end = fuse_readpages_end;
> +		req->inode = data->inode;
>  		fuse_request_send_background(fc, req);
>  	} else {
>  		fuse_request_send(fc, req);
> @@ -1178,7 +1224,8 @@ static ssize_t fuse_direct_write(struct file *file, const char __user *buf,
>  static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req)
>  {
>  	__free_page(req->pages[0]);
> -	fuse_file_put(req->ff, false);
> +	if (!fc->close_wait)
> +		fuse_file_put(req->ff, false);
>  }
>  
>  static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
> @@ -1191,6 +1238,8 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
>  	dec_bdi_stat(bdi, BDI_WRITEBACK);
>  	dec_zone_page_state(req->pages[0], NR_WRITEBACK_TEMP);
>  	bdi_writeout_inc(bdi);
> +	if (fc->close_wait)
> +		__fuse_file_put(req->ff);
>  	wake_up(&fi->page_waitq);
>  }
>  
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 3/5] fuse: wait for end of IO on release
  2012-12-20 12:30 [PATCH 0/5] fuse: close file synchronously Maxim Patlasov
@ 2012-12-20 12:31 ` Maxim Patlasov
  2013-01-02 20:35   ` Brian Foster
  0 siblings, 1 reply; 33+ messages in thread
From: Maxim Patlasov @ 2012-12-20 12:31 UTC (permalink / raw)
  To: miklos; +Cc: dev, xemul, fuse-devel, bfoster, linux-kernel, devel, anand.avati

There are two types of I/O activity that can be "in progress" at the time
of fuse_release() execution: asynchronous read-ahead and write-back. The
patch ensures that they are completed before fuse_release_common sends
FUSE_RELEASE to userspace.

So far as fuse_release() waits for end of async I/O, its callbacks
(fuse_readpages_end and fuse_writepage_finish) calling fuse_file_put cannot
be the last holders of fuse file anymore. To emphasize the fact, the patch
replaces fuse_file_put with __fuse_file_put there.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   55 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 4f23134..aed9be2 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -137,6 +137,12 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
 	}
 }
 
+static void __fuse_file_put(struct fuse_file *ff)
+{
+	if (atomic_dec_and_test(&ff->count))
+		BUG();
+}
+
 int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
 		 bool isdir)
 {
@@ -260,7 +266,12 @@ void fuse_release_common(struct file *file, int opcode)
 	 * Make the release synchronous if this is a fuseblk mount,
 	 * synchronous RELEASE is allowed (and desirable) in this case
 	 * because the server can be trusted not to screw up.
+	 *
+	 * We might wait for them (asynchronous READ or WRITE requests), so:
 	 */
+	if (ff->fc->close_wait)
+		BUG_ON(atomic_read(&ff->count) != 1);
+
 	fuse_file_put(ff, ff->fc->destroy_req != NULL);
 }
 
@@ -271,6 +282,31 @@ static int fuse_open(struct inode *inode, struct file *file)
 
 static int fuse_release(struct inode *inode, struct file *file)
 {
+	struct fuse_file *ff = file->private_data;
+
+	if (ff->fc->close_wait) {
+		struct fuse_inode *fi = get_fuse_inode(inode);
+
+		/*
+		 * Must remove file from write list. Otherwise it is possible
+		 * this file will get more writeback from another files
+		 * rerouted via write_files.
+		 */
+		spin_lock(&ff->fc->lock);
+		list_del_init(&ff->write_entry);
+		spin_unlock(&ff->fc->lock);
+
+		wait_event(fi->page_waitq, atomic_read(&ff->count) == 1);
+
+		/*
+		 * Wait for threads just released ff to leave their critical
+		 * sections. Taking spinlock is the first thing
+		 * fuse_release_common does, so that this is unnecessary, but
+		 * it is still good to emphasize right here, that we need this.
+		 */
+		spin_unlock_wait(&ff->fc->lock);
+	}
+
 	fuse_release_common(file, FUSE_RELEASE);
 
 	/* return value is ignored by VFS */
@@ -610,8 +646,17 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
 		unlock_page(page);
 		page_cache_release(page);
 	}
-	if (req->ff)
-		fuse_file_put(req->ff, false);
+	if (req->ff) {
+		if (fc->close_wait) {
+			struct fuse_inode *fi = get_fuse_inode(req->inode);
+
+			spin_lock(&fc->lock);
+			__fuse_file_put(req->ff);
+			wake_up(&fi->page_waitq);
+			spin_unlock(&fc->lock);
+		} else
+			fuse_file_put(req->ff, false);
+	}
 }
 
 struct fuse_fill_data {
@@ -637,6 +682,7 @@ static void fuse_send_readpages(struct fuse_fill_data *data)
 	if (fc->async_read) {
 		req->ff = fuse_file_get(ff);
 		req->end = fuse_readpages_end;
+		req->inode = data->inode;
 		fuse_request_send_background(fc, req);
 	} else {
 		fuse_request_send(fc, req);
@@ -1178,7 +1224,8 @@ static ssize_t fuse_direct_write(struct file *file, const char __user *buf,
 static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req)
 {
 	__free_page(req->pages[0]);
-	fuse_file_put(req->ff, false);
+	if (!fc->close_wait)
+		fuse_file_put(req->ff, false);
 }
 
 static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
@@ -1191,6 +1238,8 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
 	dec_bdi_stat(bdi, BDI_WRITEBACK);
 	dec_zone_page_state(req->pages[0], NR_WRITEBACK_TEMP);
 	bdi_writeout_inc(bdi);
+	if (fc->close_wait)
+		__fuse_file_put(req->ff);
 	wake_up(&fi->page_waitq);
 }
 


^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2014-10-18 22:45 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-25 12:05 [PATCH 0/5] fuse: handle release synchronously (v4) Maxim Patlasov
2014-09-25 12:05 ` [PATCH 1/5] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags Maxim Patlasov
2014-09-25 12:05 ` [PATCH 2/5] fuse: cosmetic rework of fuse_send_readpages Maxim Patlasov
2014-09-25 12:06 ` [PATCH 3/5] fuse: wait for end of IO on release Maxim Patlasov
2014-09-25 12:06 ` [PATCH 4/5] fuse: add mount option to disable synchronous release Maxim Patlasov
2014-09-25 12:07 ` [PATCH 5/5] fuse: enable close_wait " Maxim Patlasov
2014-09-26 15:28 ` [PATCH 0/5] fuse: handle release synchronously (v4) Miklos Szeredi
2014-09-30  3:15   ` Miklos Szeredi
2014-09-30  3:55 ` Linus Torvalds
     [not found]   ` <CAFboF2yhGyjk4e_CHQV5b2WvB-QhsWNyHvFiFG_OM_=3-KArLQ@mail.gmail.com>
2014-09-30  7:43     ` Miklos Szeredi
2014-09-30 18:22     ` Linus Torvalds
2014-09-30 19:04     ` Linus Torvalds
2014-09-30 19:19       ` Miklos Szeredi
2014-09-30 20:44         ` Linus Torvalds
2014-10-01  3:47           ` Miklos Szeredi
2014-10-01 11:28           ` Maxim Patlasov
2014-10-09  8:14             ` Miklos Szeredi
2014-10-16 10:31               ` Maxim Patlasov
2014-10-16 13:43                 ` Miklos Szeredi
2014-10-16 13:54                   ` Linus Torvalds
2014-10-17  8:55                     ` Miklos Szeredi
2014-10-18 15:35                       ` Linus Torvalds
2014-10-18 15:40                         ` Linus Torvalds
2014-10-18 18:01                           ` Miklos Szeredi
2014-10-18 18:24                             ` Al Viro
2014-10-18 18:45                               ` Al Viro
2014-10-18 18:38                             ` Linus Torvalds
2014-10-18 18:22                           ` Al Viro
2014-10-18 22:44                             ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2014-06-06 13:27 [PATCH 0/5] fuse: close file synchronously (v2) Maxim Patlasov
2014-06-06 13:29 ` [PATCH 3/5] fuse: wait for end of IO on release Maxim Patlasov
2012-12-20 12:30 [PATCH 0/5] fuse: close file synchronously Maxim Patlasov
2012-12-20 12:31 ` [PATCH 3/5] fuse: wait for end of IO on release Maxim Patlasov
2013-01-02 20:35   ` Brian Foster
2013-01-15 14:04     ` Maxim V. Patlasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.