* [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job
2015-09-17 12:39 [PATCH v2 0/4] fs: allow userland tasks to use delayed_fput infrastructure Jeff Layton
@ 2015-09-17 12:39 ` Jeff Layton
2015-10-04 13:35 ` Jeff Layton
2015-09-17 12:39 ` [PATCH v2 2/4] fs: add a kerneldoc header to fput Jeff Layton
` (2 subsequent siblings)
3 siblings, 1 reply; 6+ messages in thread
From: Jeff Layton @ 2015-09-17 12:39 UTC (permalink / raw)
To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel
I think there's a potential race in flush_delayed_fput. A kthread does
an fput() and that file gets added to the list and the delayed work is
scheduled. More than 1 jiffy passes, and the workqueue thread picks up
the work and starts running it. Then the kthread calls
flush_delayed_work. It sees that the list is empty and returns
immediately, even though the __fput for its file may not have run yet.
Close this by making flush_delayed_fput use flush_delayed_work instead,
which should immediately schedule the work to run if it's not already,
and block until the workqueue job completes.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
fs/file_table.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index ad17e05ebf95..52cc6803c07a 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -244,6 +244,8 @@ static void ____fput(struct callback_head *work)
__fput(container_of(work, struct file, f_u.fu_rcuhead));
}
+static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
+
/*
* If kernel thread really needs to have the final fput() it has done
* to complete, call this. The only user right now is the boot - we
@@ -256,11 +258,9 @@ static void ____fput(struct callback_head *work)
*/
void flush_delayed_fput(void)
{
- delayed_fput(NULL);
+ flush_delayed_work(&delayed_fput_work);
}
-static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
-
void fput(struct file *file)
{
if (atomic_long_dec_and_test(&file->f_count)) {
--
2.4.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job
2015-09-17 12:39 ` [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
@ 2015-10-04 13:35 ` Jeff Layton
0 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2015-10-04 13:35 UTC (permalink / raw)
To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel
On Thu, 17 Sep 2015 08:39:44 -0400
Jeff Layton <jlayton@poochiereds.net> wrote:
> I think there's a potential race in flush_delayed_fput. A kthread does
> an fput() and that file gets added to the list and the delayed work is
> scheduled. More than 1 jiffy passes, and the workqueue thread picks up
> the work and starts running it. Then the kthread calls
> flush_delayed_work. It sees that the list is empty and returns
> immediately, even though the __fput for its file may not have run yet.
>
> Close this by making flush_delayed_fput use flush_delayed_work instead,
> which should immediately schedule the work to run if it's not already,
> and block until the workqueue job completes.
>
> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> ---
> fs/file_table.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
It should be noted that the only current user of flush_delayed_fput()
can call it before the workqueue threads are ever started.
Looking at the code, I *think* this will still do the right thing --
block until those threads are started and then flush the work as usual,
but do let me know if I've misread it.
> diff --git a/fs/file_table.c b/fs/file_table.c
> index ad17e05ebf95..52cc6803c07a 100644
> --- a/fs/file_table.c
> +++ b/fs/file_table.c
> @@ -244,6 +244,8 @@ static void ____fput(struct callback_head *work)
> __fput(container_of(work, struct file, f_u.fu_rcuhead));
> }
>
> +static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
> +
> /*
> * If kernel thread really needs to have the final fput() it has done
> * to complete, call this. The only user right now is the boot - we
> @@ -256,11 +258,9 @@ static void ____fput(struct callback_head *work)
> */
> void flush_delayed_fput(void)
> {
> - delayed_fput(NULL);
> + flush_delayed_work(&delayed_fput_work);
> }
>
> -static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
> -
> void fput(struct file *file)
> {
> if (atomic_long_dec_and_test(&file->f_count)) {
--
Jeff Layton <jlayton@poochiereds.net>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 2/4] fs: add a kerneldoc header to fput
2015-09-17 12:39 [PATCH v2 0/4] fs: allow userland tasks to use delayed_fput infrastructure Jeff Layton
2015-09-17 12:39 ` [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
@ 2015-09-17 12:39 ` Jeff Layton
2015-09-17 12:39 ` [PATCH v2 3/4] fs: add fput_queue Jeff Layton
2015-09-17 12:39 ` [PATCH v2 4/4] fs: export flush_delayed_fput Jeff Layton
3 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2015-09-17 12:39 UTC (permalink / raw)
To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel
...and move its EXPORT_SYMBOL just below the function.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
fs/file_table.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index 52cc6803c07a..8cfeaee6323f 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -261,6 +261,25 @@ void flush_delayed_fput(void)
flush_delayed_work(&delayed_fput_work);
}
+/**
+ * fput - put a struct file reference
+ * @file: file of which to put the reference
+ *
+ * This function decrements the reference count for the struct file reference,
+ * and queues it up for destruction if the count goes to zero. In the case of
+ * most tasks we queue it to the task_work infrastructure, which will be run
+ * just before the task returns back to userspace. kthreads however never
+ * return to userspace, so for those we add them to a global list and schedule
+ * a delayed workqueue job to do the final cleanup work.
+ *
+ * Why not just do it synchronously? __fput can involve taking locks of all
+ * sorts, and doing it synchronously means that the callers must take extra care
+ * not to deadlock. That can be very difficult to ensure, so by deferring it
+ * until just before return to userland or to the workqueue, we sidestep that
+ * nastiness. Also, __fput can be quite stack intensive, so doing a final fput
+ * has the possibility of blowing up if we don't take steps to ensure that we
+ * have enough stack space to make it work.
+ */
void fput(struct file *file)
{
if (atomic_long_dec_and_test(&file->f_count)) {
@@ -281,6 +300,7 @@ void fput(struct file *file)
schedule_delayed_work(&delayed_fput_work, 1);
}
}
+EXPORT_SYMBOL(fput);
/*
* synchronous analog of fput(); for kernel threads that might be needed
@@ -299,7 +319,6 @@ void __fput_sync(struct file *file)
}
}
-EXPORT_SYMBOL(fput);
void put_filp(struct file *file)
{
--
2.4.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 3/4] fs: add fput_queue
2015-09-17 12:39 [PATCH v2 0/4] fs: allow userland tasks to use delayed_fput infrastructure Jeff Layton
2015-09-17 12:39 ` [PATCH v2 1/4] fs: have flush_delayed_fput flush the workqueue job Jeff Layton
2015-09-17 12:39 ` [PATCH v2 2/4] fs: add a kerneldoc header to fput Jeff Layton
@ 2015-09-17 12:39 ` Jeff Layton
2015-09-17 12:39 ` [PATCH v2 4/4] fs: export flush_delayed_fput Jeff Layton
3 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2015-09-17 12:39 UTC (permalink / raw)
To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel
When nfsd caches a file, we want to be able to close it down in advance
of setlease attempts. Setting a lease is generally done at the behest of
userland, so we need a mechanism to ensure that a userland task can
completely close a file without having to return back to userspace.
To do this, we borrow the delayed_fput infrastructure that kthreads use.
fput_queue will queue to the delayed_fput list if the last reference was
put. The caller can then call flush_delayed_fput to ensure that the files
are completely closed before proceeding.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
fs/file_table.c | 27 +++++++++++++++++++++++++++
include/linux/file.h | 1 +
2 files changed, 28 insertions(+)
diff --git a/fs/file_table.c b/fs/file_table.c
index 8cfeaee6323f..95361d2b8a08 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -302,6 +302,33 @@ void fput(struct file *file)
}
EXPORT_SYMBOL(fput);
+/**
+ * fput_queue - do an fput without using task_work
+ * @file: file of which to put the reference
+ *
+ * When fput is called in the context of a userland process, it'll queue the
+ * actual work (__fput()) to be done just before returning to userland. In some
+ * cases however, we need to ensure that the __fput runs before that point.
+ * There is no safe way to flush work that has been queued via task_work_add
+ * however, so to do this we borrow the delayed_fput infrastructure that
+ * kthreads use. The userland process can use fput_queue() on one or more
+ * struct files and then call flush_delayed_fput() to ensure that they are
+ * completely closed before proceeding.
+ *
+ * Returns true if the final fput was done, false otherwise. The caller can
+ * use this to determine whether to flush_delayed_fput afterward.
+ */
+bool fput_queue(struct file *file)
+{
+ if (atomic_long_dec_and_test(&file->f_count)) {
+ if (llist_add(&file->f_u.fu_llist, &delayed_fput_list))
+ schedule_delayed_work(&delayed_fput_work, 1);
+ return true;
+ }
+ return false;
+}
+EXPORT_SYMBOL(fput_queue);
+
/*
* synchronous analog of fput(); for kernel threads that might be needed
* in some umount() (and thus can't use flush_delayed_fput() without
diff --git a/include/linux/file.h b/include/linux/file.h
index f87d30882a24..f9308c9a0746 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -12,6 +12,7 @@
struct file;
extern void fput(struct file *);
+extern bool fput_queue(struct file *);
struct file_operations;
struct vfsmount;
--
2.4.3
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH v2 4/4] fs: export flush_delayed_fput
2015-09-17 12:39 [PATCH v2 0/4] fs: allow userland tasks to use delayed_fput infrastructure Jeff Layton
` (2 preceding siblings ...)
2015-09-17 12:39 ` [PATCH v2 3/4] fs: add fput_queue Jeff Layton
@ 2015-09-17 12:39 ` Jeff Layton
3 siblings, 0 replies; 6+ messages in thread
From: Jeff Layton @ 2015-09-17 12:39 UTC (permalink / raw)
To: Al Viro; +Cc: bfields, linux-nfs, linux-fsdevel, linux-kernel
...and clean up the comments over it a bit. The nfsd code will need to
be able to call back into this.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
---
fs/file_table.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/fs/file_table.c b/fs/file_table.c
index 95361d2b8a08..899c19687cfa 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -246,20 +246,24 @@ static void ____fput(struct callback_head *work)
static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
-/*
- * If kernel thread really needs to have the final fput() it has done
- * to complete, call this. The only user right now is the boot - we
- * *do* need to make sure our writes to binaries on initramfs has
- * not left us with opened struct file waiting for __fput() - execve()
- * won't work without that. Please, don't add more callers without
- * very good reasons; in particular, never call that with locks
- * held and never call that from a thread that might need to do
- * some work on any kind of umount.
+/**
+ * flush_delayed_fput - ensure that all delayed_fput work is complete
+ *
+ * If kernel thread or task that has used fput_queue really needs to have the
+ * final fput() it has done to complete, call this. One of the main users is
+ * the boot - we *do* need to make sure our writes to binaries on initramfs has
+ * not left us with opened struct file waiting for __fput() - execve() won't
+ * work without that.
+ *
+ * Please, don't add more callers without very good reasons; in particular,
+ * never call that with locks held and never from a thread that might need to
+ * do some work on any kind of umount.
*/
void flush_delayed_fput(void)
{
flush_delayed_work(&delayed_fput_work);
}
+EXPORT_SYMBOL(flush_delayed_fput);
/**
* fput - put a struct file reference
--
2.4.3
^ permalink raw reply related [flat|nested] 6+ messages in thread