linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] fuse: handle release synchronously (v3)
@ 2014-08-21 16:07 Maxim Patlasov
  2014-08-21 16:08 ` [PATCH 1/6] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags Maxim Patlasov
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-21 16:07 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

Hi,

There is a long-standing demand for synchronous behaviour of fuse_release:

http://sourceforge.net/mailarchive/message.php?msg_id=19343889
http://sourceforge.net/mailarchive/message.php?msg_id=29814693

A year ago Avati and me explained why such a feature would be useful:

http://sourceforge.net/mailarchive/message.php?msg_id=29889055
http://sourceforge.net/mailarchive/message.php?msg_id=29867423

In short, the problem is that fuse_release (that's usually called on last
user close(2)) sends FUSE_RELEASE to userspace and returns without waiting
for ACK from userspace. Consequently, there is a gap when user regards the
file released while userspace fuse is still working on it. An attempt to
access the file from another node leads to complicated synchronization
problems because the first node still "holds" the file.

The patch-set resolves the problem by making fuse_release synchronous:
wait for ACK from userspace for FUSE_RELEASE if the feature is ON.

It must be emphasized that even if the feature is enabled (i.e. fuse_release
is synchronous), nothing guarantees that fuse_release() will be called
in the context of close(2). In fact, it may happen later, on last fput().
However, there are still a lot of use cases when close(2) is synchronous,
so the feature must be regarded as an optimization maximizing chances of
synchronous behaviour of close(2).

To keep single-threaded userspace implementations happy the patch-set
ensures that by the time fuse_release_common calls fuse_file_put, no
more in-flight I/O exists. Asynchronous fuse callbacks (like
fuse_readpages_end) cannot trigger FUSE_RELEASE anymore. Hence, we'll
never block in contexts other than close().

Changed in v2:
 - improved comments, commented spin_unlock_wait out according to Brian'
suggestions.
 - rebased on v3.15-rc8 tag of Linus' tree.
Changed in v3:
 - changed patch-set title from "close file synchronously" to "handle
   release synchronously"
 - renamed CLOSE_WAIT to SYNC_RELEASE to convey the essence of the feature
   ("synchronous release" instead of "wait on close")
 - enabled feature on per file basis (instead of global fuse_conn parameter)
 - added "disable_sync_release" mount option
 - rebased on v3.17-rc1 tag of Linus' tree.

Thanks,
Maxim

---

Maxim Patlasov (6):
      fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags
      fuse: cosmetic rework of fuse_send_readpages
      fuse: wait for end of IO on release
      fuse: enable close_wait synchronous release
      fuse: fix synchronous case of fuse_file_put()
      fuse: add mount option to disable synchronous release


 fs/fuse/file.c            |  106 ++++++++++++++++++++++++++++++++++++++-------
 fs/fuse/fuse_i.h          |    3 +
 fs/fuse/inode.c           |    8 +++
 include/uapi/linux/fuse.h |    3 +
 4 files changed, 104 insertions(+), 16 deletions(-)

-- 
Signature

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags
  2014-08-21 16:07 [PATCH 0/6] fuse: handle release synchronously (v3) Maxim Patlasov
@ 2014-08-21 16:08 ` Maxim Patlasov
  2014-08-21 16:08 ` [PATCH 2/6] fuse: cosmetic rework of fuse_send_readpages Maxim Patlasov
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-21 16:08 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

The feature will be governed by fuse file open flag FOPEN_SYNC_RELEASE.
Userspace can enable it on per file basis in the same way as for
FOPEN_KEEP_CACHE or FOPEN_DIRECT_IO.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 include/uapi/linux/fuse.h |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 25084a0..607c45c 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -205,10 +205,13 @@ struct fuse_file_lock {
  * FOPEN_DIRECT_IO: bypass page cache for this open file
  * FOPEN_KEEP_CACHE: don't invalidate the data cache on open
  * FOPEN_NONSEEKABLE: the file is not seekable
+ * FOPEN_SYNC_RELEASE: synchronously release file on last fput,
+ *                     which, in turn, not always bound to fclose(2)!
  */
 #define FOPEN_DIRECT_IO		(1 << 0)
 #define FOPEN_KEEP_CACHE	(1 << 1)
 #define FOPEN_NONSEEKABLE	(1 << 2)
+#define FOPEN_SYNC_RELEASE	(1 << 3)
 
 /**
  * INIT request/reply flags


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/6] fuse: cosmetic rework of fuse_send_readpages
  2014-08-21 16:07 [PATCH 0/6] fuse: handle release synchronously (v3) Maxim Patlasov
  2014-08-21 16:08 ` [PATCH 1/6] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags Maxim Patlasov
@ 2014-08-21 16:08 ` Maxim Patlasov
  2014-08-21 16:08 ` [PATCH 3/6] fuse: wait for end of IO on release Maxim Patlasov
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-21 16:08 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

The patch change arguments of fuse_send_readpages to give it access to inode
(will be used in the next patch of patch-set). The change is cosmetic,
no logic changed.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 912061a..7723b3f 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -827,8 +827,17 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
 		fuse_file_put(req->ff, false);
 }
 
-static void fuse_send_readpages(struct fuse_req *req, struct file *file)
+struct fuse_fill_data {
+	struct fuse_req *req;
+	struct file *file;
+	struct inode *inode;
+	unsigned nr_pages;
+};
+
+static void fuse_send_readpages(struct fuse_fill_data *data)
 {
+	struct fuse_req *req = data->req;
+	struct file *file = data->file;
 	struct fuse_file *ff = file->private_data;
 	struct fuse_conn *fc = ff->fc;
 	loff_t pos = page_offset(req->pages[0]);
@@ -850,13 +859,6 @@ static void fuse_send_readpages(struct fuse_req *req, struct file *file)
 	}
 }
 
-struct fuse_fill_data {
-	struct fuse_req *req;
-	struct file *file;
-	struct inode *inode;
-	unsigned nr_pages;
-};
-
 static int fuse_readpages_fill(void *_data, struct page *page)
 {
 	struct fuse_fill_data *data = _data;
@@ -872,7 +874,7 @@ static int fuse_readpages_fill(void *_data, struct page *page)
 	     req->pages[req->num_pages - 1]->index + 1 != page->index)) {
 		int nr_alloc = min_t(unsigned, data->nr_pages,
 				     FUSE_MAX_PAGES_PER_REQ);
-		fuse_send_readpages(req, data->file);
+		fuse_send_readpages(data);
 		if (fc->async_read)
 			req = fuse_get_req_for_background(fc, nr_alloc);
 		else
@@ -925,7 +927,7 @@ static int fuse_readpages(struct file *file, struct address_space *mapping,
 	err = read_cache_pages(mapping, pages, fuse_readpages_fill, &data);
 	if (!err) {
 		if (data.req->num_pages)
-			fuse_send_readpages(data.req, file);
+			fuse_send_readpages(&data);
 		else
 			fuse_put_request(fc, data.req);
 	}


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/6] fuse: wait for end of IO on release
  2014-08-21 16:07 [PATCH 0/6] fuse: handle release synchronously (v3) Maxim Patlasov
  2014-08-21 16:08 ` [PATCH 1/6] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags Maxim Patlasov
  2014-08-21 16:08 ` [PATCH 2/6] fuse: cosmetic rework of fuse_send_readpages Maxim Patlasov
@ 2014-08-21 16:08 ` Maxim Patlasov
  2014-08-22 14:00   ` Miklos Szeredi
  2014-08-26  8:42   ` [PATCH 3/6] fuse: wait for end of IO on release (v2) Maxim Patlasov
  2014-08-21 16:09 ` [PATCH 4/6] fuse: enable close_wait synchronous release Maxim Patlasov
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-21 16:08 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

There are two types of I/O activity that can be "in progress" at the time
of fuse_release() execution: asynchronous read-ahead and write-back. The
patch ensures that they are completed before fuse_release_common sends
FUSE_RELEASE to userspace.

So far as fuse_release() waits for end of async I/O, its callbacks
(fuse_readpages_end and fuse_writepage_finish) calling fuse_file_put cannot
be the last holders of fuse file anymore. To emphasize the fact, the patch
replaces fuse_file_put with __fuse_file_put there.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   76 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 71 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7723b3f..73bce1b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -149,6 +149,17 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
 	}
 }
 
+/*
+ * Asynchronous callbacks may use it instead of fuse_file_put() because
+ * we guarantee that they are never last holders of ff. Hitting BUG() below
+ * will make clear any violation of the guarantee.
+ */
+static void __fuse_file_put(struct fuse_file *ff)
+{
+	if (atomic_dec_and_test(&ff->count))
+		BUG();
+}
+
 int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
 		 bool isdir)
 {
@@ -279,6 +290,11 @@ static void fuse_prepare_release(struct fuse_file *ff, int flags, int opcode)
 	req->in.args[0].value = inarg;
 }
 
+static bool must_release_synchronously(struct fuse_file *ff)
+{
+	return ff->open_flags & FOPEN_SYNC_RELEASE;
+}
+
 void fuse_release_common(struct file *file, int opcode)
 {
 	struct fuse_file *ff;
@@ -302,6 +318,13 @@ void fuse_release_common(struct file *file, int opcode)
 	req->misc.release.path = file->f_path;
 
 	/*
+	 * No more in-flight asynchronous READ or WRITE requests if
+	 * fuse file release is synchronous
+	 */
+	if (must_release_synchronously(ff))
+		BUG_ON(atomic_read(&ff->count) != 1);
+
+	/*
 	 * Normally this will send the RELEASE request, however if
 	 * some asynchronous READ or WRITE requests are outstanding,
 	 * the sending will be delayed.
@@ -321,11 +344,34 @@ static int fuse_open(struct inode *inode, struct file *file)
 static int fuse_release(struct inode *inode, struct file *file)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_file *ff = file->private_data;
 
 	/* see fuse_vma_close() for !writeback_cache case */
 	if (fc->writeback_cache)
 		write_inode_now(inode, 1);
 
+	if (must_release_synchronously(ff)) {
+		struct fuse_inode *fi = get_fuse_inode(inode);
+
+		/*
+		 * Must remove file from write list. Otherwise it is possible
+		 * this file will get more writeback from another files
+		 * rerouted via write_files.
+		 */
+		spin_lock(&ff->fc->lock);
+		list_del_init(&ff->write_entry);
+		spin_unlock(&ff->fc->lock);
+
+		wait_event(fi->page_waitq, atomic_read(&ff->count) == 1);
+
+		/*
+		 * spin_unlock_wait(&ff->fc->lock) would be natural here to
+		 * wait for threads just released ff to leave their critical
+		 * sections. But taking spinlock is the first thing
+		 * fuse_release_common does, so that this is unnecessary.
+		 */
+	}
+
 	fuse_release_common(file, FUSE_RELEASE);
 
 	/* return value is ignored by VFS */
@@ -823,8 +869,17 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
 		unlock_page(page);
 		page_cache_release(page);
 	}
-	if (req->ff)
-		fuse_file_put(req->ff, false);
+	if (req->ff) {
+		if (must_release_synchronously(req->ff)) {
+			struct fuse_inode *fi = get_fuse_inode(req->inode);
+
+			spin_lock(&fc->lock);
+			__fuse_file_put(req->ff);
+			wake_up(&fi->page_waitq);
+			spin_unlock(&fc->lock);
+		} else
+			fuse_file_put(req->ff, false);
+	}
 }
 
 struct fuse_fill_data {
@@ -851,6 +906,7 @@ static void fuse_send_readpages(struct fuse_fill_data *data)
 	if (fc->async_read) {
 		req->ff = fuse_file_get(ff);
 		req->end = fuse_readpages_end;
+		req->inode = data->inode;
 		fuse_request_send_background(fc, req);
 	} else {
 		fuse_request_send(fc, req);
@@ -1502,7 +1558,7 @@ static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req)
 	for (i = 0; i < req->num_pages; i++)
 		__free_page(req->pages[i]);
 
-	if (req->ff)
+	if (req->ff && !must_release_synchronously(req->ff))
 		fuse_file_put(req->ff, false);
 }
 
@@ -1519,6 +1575,8 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
 		dec_zone_page_state(req->pages[i], NR_WRITEBACK_TEMP);
 		bdi_writeout_inc(bdi);
 	}
+	if (must_release_synchronously(req->ff))
+		__fuse_file_put(req->ff);
 	wake_up(&fi->page_waitq);
 }
 
@@ -1659,8 +1717,16 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
 
 	ff = __fuse_write_file_get(fc, fi);
 	err = fuse_flush_times(inode, ff);
-	if (ff)
-		fuse_file_put(ff, 0);
+	if (ff) {
+		if (must_release_synchronously(ff)) {
+			spin_lock(&fc->lock);
+			__fuse_file_put(ff);
+			wake_up(&fi->page_waitq);
+			spin_unlock(&fc->lock);
+
+		} else
+			fuse_file_put(ff, false);
+	}
 
 	return err;
 }


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/6] fuse: enable close_wait synchronous release
  2014-08-21 16:07 [PATCH 0/6] fuse: handle release synchronously (v3) Maxim Patlasov
                   ` (2 preceding siblings ...)
  2014-08-21 16:08 ` [PATCH 3/6] fuse: wait for end of IO on release Maxim Patlasov
@ 2014-08-21 16:09 ` Maxim Patlasov
  2014-08-22 14:04   ` Miklos Szeredi
  2014-08-21 16:09 ` [PATCH 5/6] fuse: fix synchronous case of fuse_file_put() Maxim Patlasov
  2014-08-21 16:09 ` [PATCH 6/6] fuse: add mount option to disable synchronous release Maxim Patlasov
  5 siblings, 1 reply; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-21 16:09 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

The patch enables the feature by passing 'true' to fuse_file_put in
fuse_release_common.

Previously, this was safe only in special cases when we sure that
multi-threaded userspace won't deadlock if we'll synchronously send
FUSE_RELEASE in the context of read-ahead or write-back callback. Now, it's
always safe because callbacks don't send requests to userspace anymore.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 73bce1b..cd55488 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -333,7 +333,8 @@ void fuse_release_common(struct file *file, int opcode)
 	 * synchronous RELEASE is allowed (and desirable) in this case
 	 * because the server can be trusted not to screw up.
 	 */
-	fuse_file_put(ff, ff->fc->destroy_req != NULL);
+	fuse_file_put(ff, ff->fc->destroy_req != NULL ||
+			  must_release_synchronously(ff));
 }
 
 static int fuse_open(struct inode *inode, struct file *file)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/6] fuse: fix synchronous case of fuse_file_put()
  2014-08-21 16:07 [PATCH 0/6] fuse: handle release synchronously (v3) Maxim Patlasov
                   ` (3 preceding siblings ...)
  2014-08-21 16:09 ` [PATCH 4/6] fuse: enable close_wait synchronous release Maxim Patlasov
@ 2014-08-21 16:09 ` Maxim Patlasov
  2014-08-22 14:08   ` Miklos Szeredi
  2014-08-21 16:09 ` [PATCH 6/6] fuse: add mount option to disable synchronous release Maxim Patlasov
  5 siblings, 1 reply; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-21 16:09 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

If fuse_file_put() is called with sync==true, the user may be blocked for
a while, until userspace ACKs our FUSE_RELEASE request. This blocking must be
uninterruptible. Otherwise request could be interrupted, but file association
in user space remains.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index cd55488..b92143a 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -136,6 +136,10 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
 			path_put(&req->misc.release.path);
 			fuse_put_request(ff->fc, req);
 		} else if (sync) {
+			/* Must force. Otherwise request could be interrupted,
+			 * but file association in user space remains.
+			 */
+			req->force = 1;
 			req->background = 0;
 			fuse_request_send(ff->fc, req);
 			path_put(&req->misc.release.path);


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/6] fuse: add mount option to disable synchronous release
  2014-08-21 16:07 [PATCH 0/6] fuse: handle release synchronously (v3) Maxim Patlasov
                   ` (4 preceding siblings ...)
  2014-08-21 16:09 ` [PATCH 5/6] fuse: fix synchronous case of fuse_file_put() Maxim Patlasov
@ 2014-08-21 16:09 ` Maxim Patlasov
  2014-08-22 14:09   ` Miklos Szeredi
  5 siblings, 1 reply; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-21 16:09 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

Synchronous release ensures that kernel fuse reports to userspace exactly
last fput(). However, nothing guarantees that last fput() will happen in the
context of close(2). So userspace applications must be prepared to the
situation when close(2) returned, but processing of FUSE_RELEASE is not
completed by fuse daemon yet. To make testing such use cases easier, the
patch introduces DISABLE_SYNC_RELEASE mount option which effectively mask out
FOPEN_SYNC_RELEASE flag.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
---
 fs/fuse/file.c   |    3 ++-
 fs/fuse/fuse_i.h |    3 +++
 fs/fuse/inode.c  |    8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b92143a..ad3d357 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -296,7 +296,8 @@ static void fuse_prepare_release(struct fuse_file *ff, int flags, int opcode)
 
 static bool must_release_synchronously(struct fuse_file *ff)
 {
-	return ff->open_flags & FOPEN_SYNC_RELEASE;
+	return ff->open_flags & FOPEN_SYNC_RELEASE &&
+		!(ff->fc->flags & FUSE_DISABLE_SYNC_RELEASE);
 }
 
 void fuse_release_common(struct file *file, int opcode)
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e8e47a6..c5e2fca 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -44,6 +44,9 @@
     doing the mount will be allowed to access the filesystem */
 #define FUSE_ALLOW_OTHER         (1 << 1)
 
+/** Disable synchronous release */
+#define FUSE_DISABLE_SYNC_RELEASE (1 << 2)
+
 /** Number of page pointers embedded in fuse_req */
 #define FUSE_REQ_INLINE_PAGES 1
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 03246cd..86d47d0 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -463,6 +463,7 @@ enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_DISABLE_SYNC_RELEASE,
 	OPT_ERR
 };
 
@@ -475,6 +476,7 @@ static const match_table_t tokens = {
 	{OPT_ALLOW_OTHER,		"allow_other"},
 	{OPT_MAX_READ,			"max_read=%u"},
 	{OPT_BLKSIZE,			"blksize=%u"},
+	{OPT_DISABLE_SYNC_RELEASE,	"disable_sync_release"},
 	{OPT_ERR,			NULL}
 };
 
@@ -560,6 +562,10 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
 			d->blksize = value;
 			break;
 
+		case OPT_DISABLE_SYNC_RELEASE:
+			d->flags |= FUSE_DISABLE_SYNC_RELEASE;
+			break;
+
 		default:
 			return 0;
 		}
@@ -583,6 +589,8 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
 		seq_puts(m, ",default_permissions");
 	if (fc->flags & FUSE_ALLOW_OTHER)
 		seq_puts(m, ",allow_other");
+	if (fc->flags & FUSE_DISABLE_SYNC_RELEASE)
+		seq_puts(m, ",disable_sync_release");
 	if (fc->max_read != ~0)
 		seq_printf(m, ",max_read=%u", fc->max_read);
 	if (sb->s_bdev && sb->s_blocksize != FUSE_DEFAULT_BLKSIZE)


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] fuse: wait for end of IO on release
  2014-08-21 16:08 ` [PATCH 3/6] fuse: wait for end of IO on release Maxim Patlasov
@ 2014-08-22 14:00   ` Miklos Szeredi
  2014-08-25 15:12     ` Maxim Patlasov
  2014-08-26  8:42   ` [PATCH 3/6] fuse: wait for end of IO on release (v2) Maxim Patlasov
  1 sibling, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2014-08-22 14:00 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: fuse-devel, Anand Avati, Kernel Mailing List

On Thu, Aug 21, 2014 at 6:08 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
> There are two types of I/O activity that can be "in progress" at the time
> of fuse_release() execution: asynchronous read-ahead and write-back. The
> patch ensures that they are completed before fuse_release_common sends
> FUSE_RELEASE to userspace.
>
> So far as fuse_release() waits for end of async I/O, its callbacks
> (fuse_readpages_end and fuse_writepage_finish) calling fuse_file_put cannot
> be the last holders of fuse file anymore. To emphasize the fact, the patch
> replaces fuse_file_put with __fuse_file_put there.

1) spinlock around __fuse_file_put() is unnecessary,
wake_up/wait_event will provide the necessary synchronization.
2) can't we always wait for I/O and just make the actual RELEASE
message sync or async based on the flag?
3) and can't we merge the fuseblk case into this as well?

Thanks,
Miklos

>
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
>  fs/fuse/file.c |   76 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 71 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 7723b3f..73bce1b 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -149,6 +149,17 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
>         }
>  }
>
> +/*
> + * Asynchronous callbacks may use it instead of fuse_file_put() because
> + * we guarantee that they are never last holders of ff. Hitting BUG() below
> + * will make clear any violation of the guarantee.
> + */
> +static void __fuse_file_put(struct fuse_file *ff)
> +{
> +       if (atomic_dec_and_test(&ff->count))
> +               BUG();
> +}
> +
>  int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
>                  bool isdir)
>  {
> @@ -279,6 +290,11 @@ static void fuse_prepare_release(struct fuse_file *ff, int flags, int opcode)
>         req->in.args[0].value = inarg;
>  }
>
> +static bool must_release_synchronously(struct fuse_file *ff)
> +{
> +       return ff->open_flags & FOPEN_SYNC_RELEASE;
> +}
> +
>  void fuse_release_common(struct file *file, int opcode)
>  {
>         struct fuse_file *ff;
> @@ -302,6 +318,13 @@ void fuse_release_common(struct file *file, int opcode)
>         req->misc.release.path = file->f_path;
>
>         /*
> +        * No more in-flight asynchronous READ or WRITE requests if
> +        * fuse file release is synchronous
> +        */
> +       if (must_release_synchronously(ff))
> +               BUG_ON(atomic_read(&ff->count) != 1);
> +
> +       /*
>          * Normally this will send the RELEASE request, however if
>          * some asynchronous READ or WRITE requests are outstanding,
>          * the sending will be delayed.
> @@ -321,11 +344,34 @@ static int fuse_open(struct inode *inode, struct file *file)
>  static int fuse_release(struct inode *inode, struct file *file)
>  {
>         struct fuse_conn *fc = get_fuse_conn(inode);
> +       struct fuse_file *ff = file->private_data;
>
>         /* see fuse_vma_close() for !writeback_cache case */
>         if (fc->writeback_cache)
>                 write_inode_now(inode, 1);
>
> +       if (must_release_synchronously(ff)) {
> +               struct fuse_inode *fi = get_fuse_inode(inode);
> +
> +               /*
> +                * Must remove file from write list. Otherwise it is possible
> +                * this file will get more writeback from another files
> +                * rerouted via write_files.
> +                */
> +               spin_lock(&ff->fc->lock);
> +               list_del_init(&ff->write_entry);
> +               spin_unlock(&ff->fc->lock);
> +
> +               wait_event(fi->page_waitq, atomic_read(&ff->count) == 1);
> +
> +               /*
> +                * spin_unlock_wait(&ff->fc->lock) would be natural here to
> +                * wait for threads just released ff to leave their critical
> +                * sections. But taking spinlock is the first thing
> +                * fuse_release_common does, so that this is unnecessary.
> +                */
> +       }
> +
>         fuse_release_common(file, FUSE_RELEASE);
>
>         /* return value is ignored by VFS */
> @@ -823,8 +869,17 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
>                 unlock_page(page);
>                 page_cache_release(page);
>         }
> -       if (req->ff)
> -               fuse_file_put(req->ff, false);
> +       if (req->ff) {
> +               if (must_release_synchronously(req->ff)) {
> +                       struct fuse_inode *fi = get_fuse_inode(req->inode);
> +
> +                       spin_lock(&fc->lock);
> +                       __fuse_file_put(req->ff);
> +                       wake_up(&fi->page_waitq);
> +                       spin_unlock(&fc->lock);
> +               } else
> +                       fuse_file_put(req->ff, false);
> +       }
>  }
>
>  struct fuse_fill_data {
> @@ -851,6 +906,7 @@ static void fuse_send_readpages(struct fuse_fill_data *data)
>         if (fc->async_read) {
>                 req->ff = fuse_file_get(ff);
>                 req->end = fuse_readpages_end;
> +               req->inode = data->inode;
>                 fuse_request_send_background(fc, req);
>         } else {
>                 fuse_request_send(fc, req);
> @@ -1502,7 +1558,7 @@ static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req)
>         for (i = 0; i < req->num_pages; i++)
>                 __free_page(req->pages[i]);
>
> -       if (req->ff)
> +       if (req->ff && !must_release_synchronously(req->ff))
>                 fuse_file_put(req->ff, false);
>  }
>
> @@ -1519,6 +1575,8 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
>                 dec_zone_page_state(req->pages[i], NR_WRITEBACK_TEMP);
>                 bdi_writeout_inc(bdi);
>         }
> +       if (must_release_synchronously(req->ff))
> +               __fuse_file_put(req->ff);
>         wake_up(&fi->page_waitq);
>  }
>
> @@ -1659,8 +1717,16 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
>
>         ff = __fuse_write_file_get(fc, fi);
>         err = fuse_flush_times(inode, ff);
> -       if (ff)
> -               fuse_file_put(ff, 0);
> +       if (ff) {
> +               if (must_release_synchronously(ff)) {
> +                       spin_lock(&fc->lock);
> +                       __fuse_file_put(ff);
> +                       wake_up(&fi->page_waitq);
> +                       spin_unlock(&fc->lock);
> +
> +               } else
> +                       fuse_file_put(ff, false);
> +       }
>
>         return err;
>  }
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] fuse: enable close_wait synchronous release
  2014-08-21 16:09 ` [PATCH 4/6] fuse: enable close_wait synchronous release Maxim Patlasov
@ 2014-08-22 14:04   ` Miklos Szeredi
  2014-08-25 15:27     ` Maxim Patlasov
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2014-08-22 14:04 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: fuse-devel, Anand Avati, Kernel Mailing List

On Thu, Aug 21, 2014 at 6:09 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
> The patch enables the feature by passing 'true' to fuse_file_put in
> fuse_release_common.
>
> Previously, this was safe only in special cases when we sure that
> multi-threaded userspace won't deadlock if we'll synchronously send
> FUSE_RELEASE in the context of read-ahead or write-back callback. Now, it's
> always safe because callbacks don't send requests to userspace anymore.

But we do want to make this privileged, as there are unlikely but
possible DoS scenarios with a sync release.

Thanks,
Miklos

>
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
>  fs/fuse/file.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 73bce1b..cd55488 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -333,7 +333,8 @@ void fuse_release_common(struct file *file, int opcode)
>          * synchronous RELEASE is allowed (and desirable) in this case
>          * because the server can be trusted not to screw up.
>          */
> -       fuse_file_put(ff, ff->fc->destroy_req != NULL);
> +       fuse_file_put(ff, ff->fc->destroy_req != NULL ||
> +                         must_release_synchronously(ff));
>  }
>
>  static int fuse_open(struct inode *inode, struct file *file)
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] fuse: fix synchronous case of fuse_file_put()
  2014-08-21 16:09 ` [PATCH 5/6] fuse: fix synchronous case of fuse_file_put() Maxim Patlasov
@ 2014-08-22 14:08   ` Miklos Szeredi
  2014-08-25 15:58     ` Maxim Patlasov
  2014-09-11 16:14     ` Maxim Patlasov
  0 siblings, 2 replies; 19+ messages in thread
From: Miklos Szeredi @ 2014-08-22 14:08 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: fuse-devel, Anand Avati, Kernel Mailing List

On Thu, Aug 21, 2014 at 6:09 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
> If fuse_file_put() is called with sync==true, the user may be blocked for
> a while, until userspace ACKs our FUSE_RELEASE request. This blocking must be
> uninterruptible. Otherwise request could be interrupted, but file association
> in user space remains.
>
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
>  fs/fuse/file.c |    4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index cd55488..b92143a 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -136,6 +136,10 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
>                         path_put(&req->misc.release.path);
>                         fuse_put_request(ff->fc, req);
>                 } else if (sync) {
> +                       /* Must force. Otherwise request could be interrupted,
> +                        * but file association in user space remains.
> +                        */
> +                       req->force = 1;
>                         req->background = 0;
>                         fuse_request_send(ff->fc, req);
>                         path_put(&req->misc.release.path);
>


Some thought needs to go into this:  if RELEASE is interrupted, then
we should possibly allow that, effectively backgrounding the request.

The synchronous nature is just an optimization and we really don't
know where we are being interrupted, possibly in a place which very
much *should* allow interruption.

Also fuse really should distinguish fatal and non-fatal interruptions
and handle them accordingly...

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 6/6] fuse: add mount option to disable synchronous release
  2014-08-21 16:09 ` [PATCH 6/6] fuse: add mount option to disable synchronous release Maxim Patlasov
@ 2014-08-22 14:09   ` Miklos Szeredi
  2014-08-22 18:10     ` [fuse-devel] " Anand Avati
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2014-08-22 14:09 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: fuse-devel, Anand Avati, Kernel Mailing List

On Thu, Aug 21, 2014 at 6:09 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
> Synchronous release ensures that kernel fuse reports to userspace exactly
> last fput(). However, nothing guarantees that last fput() will happen in the
> context of close(2). So userspace applications must be prepared to the
> situation when close(2) returned, but processing of FUSE_RELEASE is not
> completed by fuse daemon yet. To make testing such use cases easier, the
> patch introduces DISABLE_SYNC_RELEASE mount option which effectively mask out
> FOPEN_SYNC_RELEASE flag.

Why not make this an INIT flag?  Much easier to add support for in userspace.

Thanks,
Miklos

>
> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
> ---
>  fs/fuse/file.c   |    3 ++-
>  fs/fuse/fuse_i.h |    3 +++
>  fs/fuse/inode.c  |    8 ++++++++
>  3 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index b92143a..ad3d357 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -296,7 +296,8 @@ static void fuse_prepare_release(struct fuse_file *ff, int flags, int opcode)
>
>  static bool must_release_synchronously(struct fuse_file *ff)
>  {
> -       return ff->open_flags & FOPEN_SYNC_RELEASE;
> +       return ff->open_flags & FOPEN_SYNC_RELEASE &&
> +               !(ff->fc->flags & FUSE_DISABLE_SYNC_RELEASE);
>  }
>
>  void fuse_release_common(struct file *file, int opcode)
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index e8e47a6..c5e2fca 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -44,6 +44,9 @@
>      doing the mount will be allowed to access the filesystem */
>  #define FUSE_ALLOW_OTHER         (1 << 1)
>
> +/** Disable synchronous release */
> +#define FUSE_DISABLE_SYNC_RELEASE (1 << 2)
> +
>  /** Number of page pointers embedded in fuse_req */
>  #define FUSE_REQ_INLINE_PAGES 1
>
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 03246cd..86d47d0 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -463,6 +463,7 @@ enum {
>         OPT_ALLOW_OTHER,
>         OPT_MAX_READ,
>         OPT_BLKSIZE,
> +       OPT_DISABLE_SYNC_RELEASE,
>         OPT_ERR
>  };
>
> @@ -475,6 +476,7 @@ static const match_table_t tokens = {
>         {OPT_ALLOW_OTHER,               "allow_other"},
>         {OPT_MAX_READ,                  "max_read=%u"},
>         {OPT_BLKSIZE,                   "blksize=%u"},
> +       {OPT_DISABLE_SYNC_RELEASE,      "disable_sync_release"},
>         {OPT_ERR,                       NULL}
>  };
>
> @@ -560,6 +562,10 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
>                         d->blksize = value;
>                         break;
>
> +               case OPT_DISABLE_SYNC_RELEASE:
> +                       d->flags |= FUSE_DISABLE_SYNC_RELEASE;
> +                       break;
> +
>                 default:
>                         return 0;
>                 }
> @@ -583,6 +589,8 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
>                 seq_puts(m, ",default_permissions");
>         if (fc->flags & FUSE_ALLOW_OTHER)
>                 seq_puts(m, ",allow_other");
> +       if (fc->flags & FUSE_DISABLE_SYNC_RELEASE)
> +               seq_puts(m, ",disable_sync_release");
>         if (fc->max_read != ~0)
>                 seq_printf(m, ",max_read=%u", fc->max_read);
>         if (sb->s_bdev && sb->s_blocksize != FUSE_DEFAULT_BLKSIZE)
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [fuse-devel] [PATCH 6/6] fuse: add mount option to disable synchronous release
  2014-08-22 14:09   ` Miklos Szeredi
@ 2014-08-22 18:10     ` Anand Avati
  0 siblings, 0 replies; 19+ messages in thread
From: Anand Avati @ 2014-08-22 18:10 UTC (permalink / raw)
  To: Miklos Szeredi, Maxim Patlasov
  Cc: fuse-devel, Anand Avati, Kernel Mailing List

On 8/22/14, 7:09 AM, Miklos Szeredi wrote:
> On Thu, Aug 21, 2014 at 6:09 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
>> Synchronous release ensures that kernel fuse reports to userspace exactly
>> last fput(). However, nothing guarantees that last fput() will happen in the
>> context of close(2). So userspace applications must be prepared to the
>> situation when close(2) returned, but processing of FUSE_RELEASE is not
>> completed by fuse daemon yet. To make testing such use cases easier, the
>> patch introduces DISABLE_SYNC_RELEASE mount option which effectively mask out
>> FOPEN_SYNC_RELEASE flag.
>
> Why not make this an INIT flag?  Much easier to add support for in userspace.
>
> Thanks,
> Miklos

In a real world scenario you may want to perform synchronous release 
selectively. As such performance over lots of small files is generally 
slow because of context switches, and a synchronous release adds an 
extra switch. In cases like 'grep -r' over lots of text files where 
consistency don't matter or userspace knows data doesn't get modified, 
the userspace can chose to not perform synchronous release with some 
heuristics (like size is small and open flag is O_RDONLY etc.) where as 
for large read/write files (like VM images?) you would want to perform 
synchronous release for consistency purposes. A global flag in INIT 
would make it too crude and impose the same consistency for all files.

Thanks
Avati

>
>>
>> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
>> ---
>> ---
>>   fs/fuse/file.c   |    3 ++-
>>   fs/fuse/fuse_i.h |    3 +++
>>   fs/fuse/inode.c  |    8 ++++++++
>>   3 files changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index b92143a..ad3d357 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -296,7 +296,8 @@ static void fuse_prepare_release(struct fuse_file *ff, int flags, int opcode)
>>
>>   static bool must_release_synchronously(struct fuse_file *ff)
>>   {
>> -       return ff->open_flags & FOPEN_SYNC_RELEASE;
>> +       return ff->open_flags & FOPEN_SYNC_RELEASE &&
>> +               !(ff->fc->flags & FUSE_DISABLE_SYNC_RELEASE);
>>   }
>>
>>   void fuse_release_common(struct file *file, int opcode)
>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>> index e8e47a6..c5e2fca 100644
>> --- a/fs/fuse/fuse_i.h
>> +++ b/fs/fuse/fuse_i.h
>> @@ -44,6 +44,9 @@
>>       doing the mount will be allowed to access the filesystem */
>>   #define FUSE_ALLOW_OTHER         (1 << 1)
>>
>> +/** Disable synchronous release */
>> +#define FUSE_DISABLE_SYNC_RELEASE (1 << 2)
>> +
>>   /** Number of page pointers embedded in fuse_req */
>>   #define FUSE_REQ_INLINE_PAGES 1
>>
>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>> index 03246cd..86d47d0 100644
>> --- a/fs/fuse/inode.c
>> +++ b/fs/fuse/inode.c
>> @@ -463,6 +463,7 @@ enum {
>>          OPT_ALLOW_OTHER,
>>          OPT_MAX_READ,
>>          OPT_BLKSIZE,
>> +       OPT_DISABLE_SYNC_RELEASE,
>>          OPT_ERR
>>   };
>>
>> @@ -475,6 +476,7 @@ static const match_table_t tokens = {
>>          {OPT_ALLOW_OTHER,               "allow_other"},
>>          {OPT_MAX_READ,                  "max_read=%u"},
>>          {OPT_BLKSIZE,                   "blksize=%u"},
>> +       {OPT_DISABLE_SYNC_RELEASE,      "disable_sync_release"},
>>          {OPT_ERR,                       NULL}
>>   };
>>
>> @@ -560,6 +562,10 @@ static int parse_fuse_opt(char *opt, struct fuse_mount_data *d, int is_bdev)
>>                          d->blksize = value;
>>                          break;
>>
>> +               case OPT_DISABLE_SYNC_RELEASE:
>> +                       d->flags |= FUSE_DISABLE_SYNC_RELEASE;
>> +                       break;
>> +
>>                  default:
>>                          return 0;
>>                  }
>> @@ -583,6 +589,8 @@ static int fuse_show_options(struct seq_file *m, struct dentry *root)
>>                  seq_puts(m, ",default_permissions");
>>          if (fc->flags & FUSE_ALLOW_OTHER)
>>                  seq_puts(m, ",allow_other");
>> +       if (fc->flags & FUSE_DISABLE_SYNC_RELEASE)
>> +               seq_puts(m, ",disable_sync_release");
>>          if (fc->max_read != ~0)
>>                  seq_printf(m, ",max_read=%u", fc->max_read);
>>          if (sb->s_bdev && sb->s_blocksize != FUSE_DEFAULT_BLKSIZE)
>>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> fuse-devel mailing list
> fuse-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/fuse-devel
>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/6] fuse: wait for end of IO on release
  2014-08-22 14:00   ` Miklos Szeredi
@ 2014-08-25 15:12     ` Maxim Patlasov
  0 siblings, 0 replies; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-25 15:12 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, Anand Avati, Kernel Mailing List

On 08/22/2014 06:00 PM, Miklos Szeredi wrote:
> On Thu, Aug 21, 2014 at 6:08 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
>> There are two types of I/O activity that can be "in progress" at the time
>> of fuse_release() execution: asynchronous read-ahead and write-back. The
>> patch ensures that they are completed before fuse_release_common sends
>> FUSE_RELEASE to userspace.
>>
>> So far as fuse_release() waits for end of async I/O, its callbacks
>> (fuse_readpages_end and fuse_writepage_finish) calling fuse_file_put cannot
>> be the last holders of fuse file anymore. To emphasize the fact, the patch
>> replaces fuse_file_put with __fuse_file_put there.
> 1) spinlock around __fuse_file_put() is unnecessary,
> wake_up/wait_event will provide the necessary synchronization.
Yes, I agree.

> 2) can't we always wait for I/O and just make the actual RELEASE
> message sync or async based on the flag?
> 3) and can't we merge the fuseblk case into this as well?

I think this is doable, but the same argument that Anand suggested for 
doing sync release selectively:

> In a real world scenario you may want to perform synchronous release 
> selectively. As such performance over lots of small files is generally 
> slow because of context switches, and a synchronous release adds an 
> extra switch.

is applicable here: if an application opened a file read-only and read a 
block initiating read-ahead, it's not obvious why the app doing close(2) 
should always wait for the end of that read-ahead. For some fs it's 
desirable, for others is not. Let's fuse daemon decide which behaviour 
is preferable.

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] fuse: enable close_wait synchronous release
  2014-08-22 14:04   ` Miklos Szeredi
@ 2014-08-25 15:27     ` Maxim Patlasov
  0 siblings, 0 replies; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-25 15:27 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, Anand Avati, Kernel Mailing List

On 08/22/2014 06:04 PM, Miklos Szeredi wrote:
> On Thu, Aug 21, 2014 at 6:09 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
>> The patch enables the feature by passing 'true' to fuse_file_put in
>> fuse_release_common.
>>
>> Previously, this was safe only in special cases when we sure that
>> multi-threaded userspace won't deadlock if we'll synchronously send
>> FUSE_RELEASE in the context of read-ahead or write-back callback. Now, it's
>> always safe because callbacks don't send requests to userspace anymore.
> But we do want to make this privileged, as there are unlikely but
> possible DoS scenarios with a sync release.

The latest patch of the set implements DISABLE_SYNC_RELEASE mount 
option. We can instrument fusermount to use the option by default for 
unprivileged mounts (allowing system administrator to configure it like 
"user_allow_other"). Do you have a better way to implement DoS 
protection in mind?

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] fuse: fix synchronous case of fuse_file_put()
  2014-08-22 14:08   ` Miklos Szeredi
@ 2014-08-25 15:58     ` Maxim Patlasov
  2014-09-11 16:14     ` Maxim Patlasov
  1 sibling, 0 replies; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-25 15:58 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, Anand Avati, Kernel Mailing List

On 08/22/2014 06:08 PM, Miklos Szeredi wrote:
> On Thu, Aug 21, 2014 at 6:09 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
>> If fuse_file_put() is called with sync==true, the user may be blocked for
>> a while, until userspace ACKs our FUSE_RELEASE request. This blocking must be
>> uninterruptible. Otherwise request could be interrupted, but file association
>> in user space remains.
>>
>> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
>> ---
>>   fs/fuse/file.c |    4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index cd55488..b92143a 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -136,6 +136,10 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
>>                          path_put(&req->misc.release.path);
>>                          fuse_put_request(ff->fc, req);
>>                  } else if (sync) {
>> +                       /* Must force. Otherwise request could be interrupted,
>> +                        * but file association in user space remains.
>> +                        */
>> +                       req->force = 1;
>>                          req->background = 0;
>>                          fuse_request_send(ff->fc, req);
>>                          path_put(&req->misc.release.path);
>>
>
> Some thought needs to go into this:  if RELEASE is interrupted, then
> we should possibly allow that, effectively backgrounding the request.
>
> The synchronous nature is just an optimization and we really don't
> know where we are being interrupted, possibly in a place which very
> much *should* allow interruption.

A fuse daemon who explicitly enables the feature (synchronous release) 
would definitely want non-interruptible behaviour of last fput. 
Otherwise, it would face the same problem that the feature tries to 
resolve: an application was killed and exited, but there is no way to 
determine why actual processing of RELEASE will be completed.

As for fuseblk mounts, I'm not so sure. I believed the lack of force=1 
was a bug and my patch fixes it. If you think it's safer to preserve old 
behaviour, I could set "force" conditionally. May be you could explain 
in more details why you think we should allow interruption somewhere. 
Any examples or use cases? Btw, fuse_flush also uses force=1. Do you 
concerns deal with it as well?

>
> Also fuse really should distinguish fatal and non-fatal interruptions
> and handle them accordingly...

Do you think it's worthy to elaborate this in the scope of "synchronous 
release" feature?

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/6] fuse: wait for end of IO on release (v2)
  2014-08-21 16:08 ` [PATCH 3/6] fuse: wait for end of IO on release Maxim Patlasov
  2014-08-22 14:00   ` Miklos Szeredi
@ 2014-08-26  8:42   ` Maxim Patlasov
  1 sibling, 0 replies; 19+ messages in thread
From: Maxim Patlasov @ 2014-08-26  8:42 UTC (permalink / raw)
  To: miklos; +Cc: fuse-devel, avati, linux-kernel

There are two types of I/O activity that can be "in progress" at the time
of fuse_release() execution: asynchronous read-ahead and write-back. The
patch ensures that they are completed before fuse_release_common sends
FUSE_RELEASE to userspace.

So far as fuse_release() waits for end of async I/O, its callbacks
(fuse_readpages_end and fuse_writepage_finish) calling fuse_file_put cannot
be the last holders of fuse file anymore. To emphasize the fact, the patch
replaces fuse_file_put with __fuse_file_put there.

Changed in v2 (thanks to Miklos):
 - removed redundant locking around __fuse_file_put()

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/fuse/file.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 5 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 7723b3f..8713e62 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -149,6 +149,17 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
 	}
 }
 
+/*
+ * Asynchronous callbacks may use it instead of fuse_file_put() because
+ * we guarantee that they are never last holders of ff. Hitting BUG() below
+ * will make clear any violation of the guarantee.
+ */
+static void __fuse_file_put(struct fuse_file *ff)
+{
+	if (atomic_dec_and_test(&ff->count))
+		BUG();
+}
+
 int fuse_do_open(struct fuse_conn *fc, u64 nodeid, struct file *file,
 		 bool isdir)
 {
@@ -279,6 +290,11 @@ static void fuse_prepare_release(struct fuse_file *ff, int flags, int opcode)
 	req->in.args[0].value = inarg;
 }
 
+static bool must_release_synchronously(struct fuse_file *ff)
+{
+	return ff->open_flags & FOPEN_SYNC_RELEASE;
+}
+
 void fuse_release_common(struct file *file, int opcode)
 {
 	struct fuse_file *ff;
@@ -302,6 +318,13 @@ void fuse_release_common(struct file *file, int opcode)
 	req->misc.release.path = file->f_path;
 
 	/*
+	 * No more in-flight asynchronous READ or WRITE requests if
+	 * fuse file release is synchronous
+	 */
+	if (must_release_synchronously(ff))
+		BUG_ON(atomic_read(&ff->count) != 1);
+
+	/*
 	 * Normally this will send the RELEASE request, however if
 	 * some asynchronous READ or WRITE requests are outstanding,
 	 * the sending will be delayed.
@@ -321,11 +344,34 @@ static int fuse_open(struct inode *inode, struct file *file)
 static int fuse_release(struct inode *inode, struct file *file)
 {
 	struct fuse_conn *fc = get_fuse_conn(inode);
+	struct fuse_file *ff = file->private_data;
 
 	/* see fuse_vma_close() for !writeback_cache case */
 	if (fc->writeback_cache)
 		write_inode_now(inode, 1);
 
+	if (must_release_synchronously(ff)) {
+		struct fuse_inode *fi = get_fuse_inode(inode);
+
+		/*
+		 * Must remove file from write list. Otherwise it is possible
+		 * this file will get more writeback from another files
+		 * rerouted via write_files.
+		 */
+		spin_lock(&ff->fc->lock);
+		list_del_init(&ff->write_entry);
+		spin_unlock(&ff->fc->lock);
+
+		wait_event(fi->page_waitq, atomic_read(&ff->count) == 1);
+
+		/*
+		 * spin_unlock_wait(&ff->fc->lock) would be natural here to
+		 * wait for threads just released ff to leave their critical
+		 * sections. But taking spinlock is the first thing
+		 * fuse_release_common does, so that this is unnecessary.
+		 */
+	}
+
 	fuse_release_common(file, FUSE_RELEASE);
 
 	/* return value is ignored by VFS */
@@ -823,8 +869,15 @@ static void fuse_readpages_end(struct fuse_conn *fc, struct fuse_req *req)
 		unlock_page(page);
 		page_cache_release(page);
 	}
-	if (req->ff)
-		fuse_file_put(req->ff, false);
+	if (req->ff) {
+		if (must_release_synchronously(req->ff)) {
+			struct fuse_inode *fi = get_fuse_inode(req->inode);
+
+			__fuse_file_put(req->ff);
+			wake_up(&fi->page_waitq);
+		} else
+			fuse_file_put(req->ff, false);
+	}
 }
 
 struct fuse_fill_data {
@@ -851,6 +904,7 @@ static void fuse_send_readpages(struct fuse_fill_data *data)
 	if (fc->async_read) {
 		req->ff = fuse_file_get(ff);
 		req->end = fuse_readpages_end;
+		req->inode = data->inode;
 		fuse_request_send_background(fc, req);
 	} else {
 		fuse_request_send(fc, req);
@@ -1502,7 +1556,7 @@ static void fuse_writepage_free(struct fuse_conn *fc, struct fuse_req *req)
 	for (i = 0; i < req->num_pages; i++)
 		__free_page(req->pages[i]);
 
-	if (req->ff)
+	if (req->ff && !must_release_synchronously(req->ff))
 		fuse_file_put(req->ff, false);
 }
 
@@ -1519,6 +1573,8 @@ static void fuse_writepage_finish(struct fuse_conn *fc, struct fuse_req *req)
 		dec_zone_page_state(req->pages[i], NR_WRITEBACK_TEMP);
 		bdi_writeout_inc(bdi);
 	}
+	if (must_release_synchronously(req->ff))
+		__fuse_file_put(req->ff);
 	wake_up(&fi->page_waitq);
 }
 
@@ -1659,8 +1715,13 @@ int fuse_write_inode(struct inode *inode, struct writeback_control *wbc)
 
 	ff = __fuse_write_file_get(fc, fi);
 	err = fuse_flush_times(inode, ff);
-	if (ff)
-		fuse_file_put(ff, 0);
+	if (ff) {
+		if (must_release_synchronously(ff)) {
+			__fuse_file_put(ff);
+			wake_up(&fi->page_waitq);
+		} else
+			fuse_file_put(ff, false);
+	}
 
 	return err;
 }


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] fuse: fix synchronous case of fuse_file_put()
  2014-08-22 14:08   ` Miklos Szeredi
  2014-08-25 15:58     ` Maxim Patlasov
@ 2014-09-11 16:14     ` Maxim Patlasov
  2014-09-16  8:19       ` Miklos Szeredi
  1 sibling, 1 reply; 19+ messages in thread
From: Maxim Patlasov @ 2014-09-11 16:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, Anand Avati, Kernel Mailing List

On 08/22/2014 06:08 PM, Miklos Szeredi wrote:
> On Thu, Aug 21, 2014 at 6:09 PM, Maxim Patlasov <MPatlasov@parallels.com> wrote:
>> If fuse_file_put() is called with sync==true, the user may be blocked for
>> a while, until userspace ACKs our FUSE_RELEASE request. This blocking must be
>> uninterruptible. Otherwise request could be interrupted, but file association
>> in user space remains.
>>
>> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
>> ---
>>   fs/fuse/file.c |    4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
>> index cd55488..b92143a 100644
>> --- a/fs/fuse/file.c
>> +++ b/fs/fuse/file.c
>> @@ -136,6 +136,10 @@ static void fuse_file_put(struct fuse_file *ff, bool sync)
>>                          path_put(&req->misc.release.path);
>>                          fuse_put_request(ff->fc, req);
>>                  } else if (sync) {
>> +                       /* Must force. Otherwise request could be interrupted,
>> +                        * but file association in user space remains.
>> +                        */
>> +                       req->force = 1;
>>                          req->background = 0;
>>                          fuse_request_send(ff->fc, req);
>>                          path_put(&req->misc.release.path);
>>
>
> Some thought needs to go into this:  if RELEASE is interrupted, then
> we should possibly allow that, effectively backgrounding the request.
>
> The synchronous nature is just an optimization and we really don't
> know where we are being interrupted, possibly in a place which very
> much *should* allow interruption.

I really need your help to proceed with this patch. Could you please 
explain what those places are where we should allow interruption.

BTW, as for "just an optimization", I've recently noticed that __fput() 
calls locks_remove_file(). So guarantees provided by the patch-set are 
on the same level as flock(2) behaviour.

>
> Also fuse really should distinguish fatal and non-fatal interruptions
> and handle them accordingly...

And elaborate on this concern, please.

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] fuse: fix synchronous case of fuse_file_put()
  2014-09-11 16:14     ` Maxim Patlasov
@ 2014-09-16  8:19       ` Miklos Szeredi
  2014-09-24  7:19         ` Maxim Patlasov
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2014-09-16  8:19 UTC (permalink / raw)
  To: Maxim Patlasov; +Cc: fuse-devel, Anand Avati, Kernel Mailing List

On Thu, Sep 11, 2014 at 6:14 PM, Maxim Patlasov <mpatlasov@parallels.com> wrote:

> I really need your help to proceed with this patch. Could you please explain
> what those places are where we should allow interruption.
>
> BTW, as for "just an optimization", I've recently noticed that __fput()
> calls locks_remove_file(). So guarantees provided by the patch-set are on
> the same level as flock(2) behaviour.

SIGKILL trumps that.  At least that's what I think, and that's what
NFS currently does as well, AFAICS.

>
>>
>> Also fuse really should distinguish fatal and non-fatal interruptions
>> and handle them accordingly...
>
>
> And elaborate on this concern, please.

Requests have two states where they stay for any significant amount of
time: PENDING (queued to userspace) and SENT (in userspace).

Currently we do the following for interrupted requests:

PENDING:
   - non-fatal signal: do nothing
   - fatal signal: dequeue and return -EINTR, unless force is set

SENT:
   - send INTERRUPT request to userspace

This is fine, but fatal interrupts should be able to abort SENT and
forced requests as well without having to wait for the userspace
reply.  This is what I was referring to.

This would not be difficult, were it not for i_mutex and
s_vfs_rename_mutex being held by some operations.   For correctness,
we can't release these while a reply is not received, since the
locking expecations of the userspace filesystem would not be met.
This can be solved by adding shadow locks to fuse that we hold onto
even after the request is interrupted.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] fuse: fix synchronous case of fuse_file_put()
  2014-09-16  8:19       ` Miklos Szeredi
@ 2014-09-24  7:19         ` Maxim Patlasov
  0 siblings, 0 replies; 19+ messages in thread
From: Maxim Patlasov @ 2014-09-24  7:19 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: fuse-devel, Anand Avati, Kernel Mailing List

On 09/16/2014 12:19 PM, Miklos Szeredi wrote:
> On Thu, Sep 11, 2014 at 6:14 PM, Maxim Patlasov <mpatlasov@parallels.com> wrote:
>
>> I really need your help to proceed with this patch. Could you please explain
>> what those places are where we should allow interruption.
>>
>> BTW, as for "just an optimization", I've recently noticed that __fput()
>> calls locks_remove_file(). So guarantees provided by the patch-set are on
>> the same level as flock(2) behaviour.
> SIGKILL trumps that.  At least that's what I think, and that's what
> NFS currently does as well, AFAICS.
>
>>> Also fuse really should distinguish fatal and non-fatal interruptions
>>> and handle them accordingly...
>>
>> And elaborate on this concern, please.
> Requests have two states where they stay for any significant amount of
> time: PENDING (queued to userspace) and SENT (in userspace).
>
> Currently we do the following for interrupted requests:
>
> PENDING:
>     - non-fatal signal: do nothing
>     - fatal signal: dequeue and return -EINTR, unless force is set
>
> SENT:
>     - send INTERRUPT request to userspace
>
> This is fine, but fatal interrupts should be able to abort SENT and
> forced requests as well without having to wait for the userspace
> reply.  This is what I was referring to.

Thank you for detailed clarification, that's much clearer now. If I 
understood it right, fatal signals must abort *any* request in *any* 
state. The only difference between forced and not forced requests is 
that forced ones must be eventually delivered to userspace in all cases 
(even if they were in PENDING state when we were interrupted and we 
returned -EINTR).

The thing that bothers me is the net result of these changes. Yes, 
end-user will be able to interrupt its app by SIGKIILL if it is waiting 
in request_wait_answer(). But there are many other places where kernel 
fuse waits for something dependent on userspace. Do you think we have to 
make those places interruptible as well?

>
> This would not be difficult, were it not for i_mutex and
> s_vfs_rename_mutex being held by some operations.   For correctness,
> we can't release these while a reply is not received, since the
> locking expecations of the userspace filesystem would not be met.
> This can be solved by adding shadow locks to fuse that we hold onto
> even after the request is interrupted.

Shadow locking seems to be not enough. For example, we have to postpone 
FUSE_RELEASE until all interrupted synchronous I/O is ACKed by 
userspace. And similarly we shouldn't surprise userspace by FUSE_DESTROY 
if any requests are still in-flight. May be there are other hidden 
dependencies that don't come to mind now.

Thanks,
Maxim


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-09-24  7:19 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-21 16:07 [PATCH 0/6] fuse: handle release synchronously (v3) Maxim Patlasov
2014-08-21 16:08 ` [PATCH 1/6] fuse: add FOPEN_SYNC_RELEASE flag to ff->open_flags Maxim Patlasov
2014-08-21 16:08 ` [PATCH 2/6] fuse: cosmetic rework of fuse_send_readpages Maxim Patlasov
2014-08-21 16:08 ` [PATCH 3/6] fuse: wait for end of IO on release Maxim Patlasov
2014-08-22 14:00   ` Miklos Szeredi
2014-08-25 15:12     ` Maxim Patlasov
2014-08-26  8:42   ` [PATCH 3/6] fuse: wait for end of IO on release (v2) Maxim Patlasov
2014-08-21 16:09 ` [PATCH 4/6] fuse: enable close_wait synchronous release Maxim Patlasov
2014-08-22 14:04   ` Miklos Szeredi
2014-08-25 15:27     ` Maxim Patlasov
2014-08-21 16:09 ` [PATCH 5/6] fuse: fix synchronous case of fuse_file_put() Maxim Patlasov
2014-08-22 14:08   ` Miklos Szeredi
2014-08-25 15:58     ` Maxim Patlasov
2014-09-11 16:14     ` Maxim Patlasov
2014-09-16  8:19       ` Miklos Szeredi
2014-09-24  7:19         ` Maxim Patlasov
2014-08-21 16:09 ` [PATCH 6/6] fuse: add mount option to disable synchronous release Maxim Patlasov
2014-08-22 14:09   ` Miklos Szeredi
2014-08-22 18:10     ` [fuse-devel] " Anand Avati

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).