linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] fs: allow userland tasks to use delayed_fput infrastructure
@ 2015-09-17 12:39 Jeff Layton
  2015-09-17 12:39 ` [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Jeff Layton @ 2015-09-17 12:39 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

v2:
- kerneldoc header cleanups. Hopefully they are more clear now
- make fput_queue return bool, telling whether the final reference was
  put. Caller can use that to tell whether it should call
  flush_delayed_fput.

Only minor changes since the last set. Al, does this look any more
reasonable? Original cover letter follows:

I'm breaking this piece out of the open file cache work for nfsd to see
if we can get this piece settled before I re-post the whole set. If this
looks like a reasonable approach we can sort out how it should be merged
(either by you directly, or via Bruce's tree with the rest of the open
file cache patches).

For those just joining in, some background:

We want to add an open file cache for nfsd to reduce the open/close
overhead on READ/WRITE RPCs, and so we can eliminate the raparm cache.
The basic idea is to keep a cache of open files, and close them down on
certain sorts of activity -- primarily, after an unlink that takes the
link count to 0, or before setting a lease.

The setlease part is problematic though. The plan is to have a notifier
callback into nfsd from vfs_setlease that will tell nfsd to close any
open files that are associated with the inode so we don't block lease
attempts solely due to cached but otherwise idle nfsd files. That means
that we need to be able to close out the files and ensure that the final
__fput runs before we try to set a lease.

My latest pass involved making __fput_sync available to userland tasks,
but Al had concerns that that could lead to stack blowouts and locking
issues. This patchset is an alternative approach that allows userland
tasks to use the delayed_fput infrastructure instead. The idea is that
we'd have the pre-setlease notifier do a fput_queue() and then call
flush_delayed_fput to ensure that any queued __fput() calls complete
before setting the lease.

There's also a fix for a potential race in flush_delayed_fput in here
and some doc comment cleanups.

Jeff Layton (4):
  fs: have flush_delayed_fput flush the workqueue job
  fs: add a kerneldoc header to fput
  fs: add fput_queue
  fs: export flush_delayed_fput

 fs/file_table.c      | 76 +++++++++++++++++++++++++++++++++++++++++++---------
 include/linux/file.h |  1 +
 2 files changed, 64 insertions(+), 13 deletions(-)

-- 
2.4.3


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job
  2015-09-17 12:39 [PATCH v2 0/4] fs: allow userland tasks to use delayed_fput infrastructure Jeff Layton
@ 2015-09-17 12:39 ` Jeff Layton
  2015-10-04 13:35   ` Jeff Layton
  2015-09-17 12:39 ` [PATCH v2 2/4] fs: add a kerneldoc header to fput Jeff Layton
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2015-09-17 12:39 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

I think there's a potential race in flush_delayed_fput. A kthread does
an fput() and that file gets added to the list and the delayed work is
scheduled. More than 1 jiffy passes, and the workqueue thread picks up
the work and starts running it. Then the kthread calls
flush_delayed_work.  It sees that the list is empty and returns
immediately, even though the __fput for its file may not have run yet.

Close this by making flush_delayed_fput use flush_delayed_work instead,
which should immediately schedule the work to run if it's not already,
and block until the workqueue job completes.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index ad17e05ebf95..52cc6803c07a 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -244,6 +244,8 @@ static void ____fput(struct callback_head *work)
 	__fput(container_of(work, struct file, f_u.fu_rcuhead));
 }
 
+static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
+
 /*
  * If kernel thread really needs to have the final fput() it has done
  * to complete, call this.  The only user right now is the boot - we
@@ -256,11 +258,9 @@ static void ____fput(struct callback_head *work)
  */
 void flush_delayed_fput(void)
 {
-	delayed_fput(NULL);
+	flush_delayed_work(&delayed_fput_work);
 }
 
-static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
-
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/4] fs: add a kerneldoc header to fput
  2015-09-17 12:39 [PATCH v2 0/4] fs: allow userland tasks to use delayed_fput infrastructure Jeff Layton
  2015-09-17 12:39 ` [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
@ 2015-09-17 12:39 ` Jeff Layton
  2015-09-17 12:39 ` [PATCH v2 3/4] fs: add fput_queue Jeff Layton
  2015-09-17 12:39 ` [PATCH v2 4/4] fs: export flush_delayed_fput Jeff Layton
  3 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2015-09-17 12:39 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

...and move its EXPORT_SYMBOL just below the function.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 52cc6803c07a..8cfeaee6323f 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -261,6 +261,25 @@ void flush_delayed_fput(void)
 	flush_delayed_work(&delayed_fput_work);
 }
 
+/**
+ * fput - put a struct file reference
+ * @file: file of which to put the reference
+ *
+ * This function decrements the reference count for the struct file reference,
+ * and queues it up for destruction if the count goes to zero. In the case of
+ * most tasks we queue it to the task_work infrastructure, which will be run
+ * just before the task returns back to userspace. kthreads however never
+ * return to userspace, so for those we add them to a global list and schedule
+ * a delayed workqueue job to do the final cleanup work.
+ *
+ * Why not just do it synchronously? __fput can involve taking locks of all
+ * sorts, and doing it synchronously means that the callers must take extra care
+ * not to deadlock. That can be very difficult to ensure, so by deferring it
+ * until just before return to userland or to the workqueue, we sidestep that
+ * nastiness. Also, __fput can be quite stack intensive, so doing a final fput
+ * has the possibility of blowing up if we don't take steps to ensure that we
+ * have enough stack space to make it work.
+ */
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
@@ -281,6 +300,7 @@ void fput(struct file *file)
 			schedule_delayed_work(&delayed_fput_work, 1);
 	}
 }
+EXPORT_SYMBOL(fput);
 
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
@@ -299,7 +319,6 @@ void __fput_sync(struct file *file)
 	}
 }
 
-EXPORT_SYMBOL(fput);
 
 void put_filp(struct file *file)
 {
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 3/4] fs: add fput_queue
  2015-09-17 12:39 [PATCH v2 0/4] fs: allow userland tasks to use delayed_fput infrastructure Jeff Layton
  2015-09-17 12:39 ` [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
  2015-09-17 12:39 ` [PATCH v2 2/4] fs: add a kerneldoc header to fput Jeff Layton
@ 2015-09-17 12:39 ` Jeff Layton
  2015-09-17 12:39 ` [PATCH v2 4/4] fs: export flush_delayed_fput Jeff Layton
  3 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2015-09-17 12:39 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

When nfsd caches a file, we want to be able to close it down in advance
of setlease attempts. Setting a lease is generally done at the behest of
userland, so we need a mechanism to ensure that a userland task can
completely close a file without having to return back to userspace.

To do this, we borrow the delayed_fput infrastructure that kthreads use.
fput_queue will queue to the delayed_fput list if the last reference was
put. The caller can then call flush_delayed_fput to ensure that the files
are completely closed before proceeding.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c      | 27 +++++++++++++++++++++++++++
 include/linux/file.h |  1 +
 2 files changed, 28 insertions(+)

diff --git a/fs/file_table.c b/fs/file_table.c
index 8cfeaee6323f..95361d2b8a08 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -302,6 +302,33 @@ void fput(struct file *file)
 }
 EXPORT_SYMBOL(fput);
 
+/**
+ * fput_queue - do an fput without using task_work
+ * @file: file of which to put the reference
+ *
+ * When fput is called in the context of a userland process, it'll queue the
+ * actual work (__fput()) to be done just before returning to userland. In some
+ * cases however, we need to ensure that the __fput runs before that point.
+ * There is no safe way to flush work that has been queued via task_work_add
+ * however, so to do this we borrow the delayed_fput infrastructure that
+ * kthreads use. The userland process can use fput_queue() on one or more
+ * struct files and then call flush_delayed_fput() to ensure that they are
+ * completely closed before proceeding.
+ *
+ * Returns true if the final fput was done, false otherwise. The caller can
+ * use this to determine whether to flush_delayed_fput afterward.
+ */
+bool fput_queue(struct file *file)
+{
+	if (atomic_long_dec_and_test(&file->f_count)) {
+		if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
+			schedule_delayed_work(&delayed_fput_work, 1);
+		return true;
+	}
+	return false;
+}
+EXPORT_SYMBOL(fput_queue);
+
 /*
  * synchronous analog of fput(); for kernel threads that might be needed
  * in some umount() (and thus can't use flush_delayed_fput() without
diff --git a/include/linux/file.h b/include/linux/file.h
index f87d30882a24..f9308c9a0746 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -12,6 +12,7 @@
 struct file;
 
 extern void fput(struct file *);
+extern bool fput_queue(struct file *);
 
 struct file_operations;
 struct vfsmount;
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 4/4] fs: export flush_delayed_fput
  2015-09-17 12:39 [PATCH v2 0/4] fs: allow userland tasks to use delayed_fput infrastructure Jeff Layton
                   ` (2 preceding siblings ...)
  2015-09-17 12:39 ` [PATCH v2 3/4] fs: add fput_queue Jeff Layton
@ 2015-09-17 12:39 ` Jeff Layton
  3 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2015-09-17 12:39 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

...and clean up the comments over it a bit. The nfsd code will need to
be able to call back into this.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
 fs/file_table.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index 95361d2b8a08..899c19687cfa 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -246,20 +246,24 @@ static void ____fput(struct callback_head *work)
 
 static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
 
-/*
- * If kernel thread really needs to have the final fput() it has done
- * to complete, call this.  The only user right now is the boot - we
- * *do* need to make sure our writes to binaries on initramfs has
- * not left us with opened struct file waiting for __fput() - execve()
- * won't work without that.  Please, don't add more callers without
- * very good reasons; in particular, never call that with locks
- * held and never call that from a thread that might need to do
- * some work on any kind of umount.
+/**
+ * flush_delayed_fput - ensure that all delayed_fput work is complete
+ *
+ * If kernel thread or task that has used fput_queue really needs to have the
+ * final fput() it has done to complete, call this. One of the main users is
+ * the boot - we *do* need to make sure our writes to binaries on initramfs has
+ * not left us with opened struct file waiting for __fput() - execve() won't
+ * work without that.
+ *
+ * Please, don't add more callers without very good reasons; in particular,
+ * never call that with locks held and never from a thread that might need to
+ * do some work on any kind of umount.
  */
 void flush_delayed_fput(void)
 {
 	flush_delayed_work(&delayed_fput_work);
 }
+EXPORT_SYMBOL(flush_delayed_fput);
 
 /**
  * fput - put a struct file reference
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job
  2015-09-17 12:39 ` [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
@ 2015-10-04 13:35   ` Jeff Layton
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2015-10-04 13:35 UTC (permalink / raw)
  To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel

On Thu, 17 Sep 2015 08:39:44 -0400
Jeff Layton <jlayton@poochiereds.net> wrote:

> I think there's a potential race in flush_delayed_fput. A kthread does
> an fput() and that file gets added to the list and the delayed work is
> scheduled. More than 1 jiffy passes, and the workqueue thread picks up
> the work and starts running it. Then the kthread calls
> flush_delayed_work.  It sees that the list is empty and returns
> immediately, even though the __fput for its file may not have run yet.
> 
> Close this by making flush_delayed_fput use flush_delayed_work instead,
> which should immediately schedule the work to run if it's not already,
> and block until the workqueue job completes.
> 
> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> ---
>  fs/file_table.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 

It should be noted that the only current user of flush_delayed_fput()
can call it before the workqueue threads are ever started.

Looking at the code, I *think* this will still do the right thing --
block until those threads are started and then flush the work as usual,
but do let me know if I've misread it.

> diff --git a/fs/file_table.c b/fs/file_table.c
> index ad17e05ebf95..52cc6803c07a 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -244,6 +244,8 @@ static void ____fput(struct callback_head *work)
>  	__fput(container_of(work, struct file, f_u.fu_rcuhead));
>  }
>  
> +static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
> +
>  /*
>   * If kernel thread really needs to have the final fput() it has done
>   * to complete, call this.  The only user right now is the boot - we
> @@ -256,11 +258,9 @@ static void ____fput(struct callback_head *work)
>   */
>  void flush_delayed_fput(void)
>  {
> -	delayed_fput(NULL);
> +	flush_delayed_work(&delayed_fput_work);
>  }
>  
> -static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
> -
>  void fput(struct file *file)
>  {
>  	if (atomic_long_dec_and_test(&file->f_count)) {


-- 
Jeff Layton <jlayton@poochiereds.net>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-10-04 13:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-17 12:39 [PATCH v2 0/4] fs: allow userland tasks to use delayed_fput infrastructure Jeff Layton
2015-09-17 12:39 ` [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
2015-10-04 13:35   ` Jeff Layton
2015-09-17 12:39 ` [PATCH v2 2/4] fs: add a kerneldoc header to fput Jeff Layton
2015-09-17 12:39 ` [PATCH v2 3/4] fs: add fput_queue Jeff Layton
2015-09-17 12:39 ` [PATCH v2 4/4] fs: export flush_delayed_fput Jeff Layton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).