All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] task_work: restore fifo ordering guarantee
@ 2015-09-08 17:14 Oleg Nesterov
  2015-09-08 17:14 ` [PATCH 1/3] fput: don't abuse task_work_add() when possible Oleg Nesterov
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Oleg Nesterov @ 2015-09-08 17:14 UTC (permalink / raw)
  To: Al Viro, Eric Dumazet, Linus Torvalds
  Cc: Andrew Morton, Ingo Molnar, Maciej Żenczykowski,
	Peter Zijlstra, Thomas Gleixner, linux-kernel

OK, nobody replied, I will spam you again. Modulo some cosmetic changes
this is the same patch, now with the changelog and I tried to test it.

Eric, Al, Linus, I will appreciate any comment. I still disagree with
the recent c82199061009 "task_work: remove fifo ordering guarantee".

I am not very sure about 2/3, so it comes as a separate change.

I used this trivial test-case

	#include <stdio.h>
	#include <unistd.h>
	#include <sys/wait.h>
	#include <stdlib.h>
	#include <fcntl.h>
	#include <assert.h>

	int main(int argc, const char *argv[])
	{
		int nfork = atoi(argv[1]);
		int nopen = atoi(argv[2]);

		while (nfork--) {
			if (fork()) {
				wait(NULL);
				continue;
			}

			while (nopen--)
				assert(open("/dev/null", O_RDONLY) >= 0);
			break;
		}

		return 0;
	}

to test the performance, and I see the same numbers with or without this
series. Well, actually the numbers look a little bit better when I do
"time ./o 10 1000000", but most probably this is just a noise.

Please review.

Oleg.

 fs/file_table.c    |   46 +++++++++++++++++++++++++++++++++++++++-------
 include/linux/fs.h |    5 ++++-
 kernel/task_work.c |   12 ++++++++++--
 3 files changed, 53 insertions(+), 10 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] fput: don't abuse task_work_add() when possible
  2015-09-08 17:14 [PATCH 0/3] task_work: restore fifo ordering guarantee Oleg Nesterov
@ 2015-09-08 17:14 ` Oleg Nesterov
  2015-09-08 17:14 ` [PATCH 2/3] fput: move ->f_next_put into a union with ->f_version Oleg Nesterov
  2015-09-08 17:14 ` [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee" Oleg Nesterov
  2 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2015-09-08 17:14 UTC (permalink / raw)
  To: Al Viro, Eric Dumazet, Linus Torvalds
  Cc: Andrew Morton, Ingo Molnar, Maciej Żenczykowski,
	Peter Zijlstra, Thomas Gleixner, linux-kernel

The commit c82199061009 ("task_work: remove fifo ordering guarantee")
removed the "Reverse the list" loop because the ->task_works list can
be huge if (say) a process with 2,000,000 files is killed.

However, imo this doesn't really fix the problem: we do not want this
list to be huge. For example, suppose that keyctl_session_to_parent()
races with the exiting parent which has a lot of opened files. In this
case task_work_cancel() will spend the same time walking the list but
with irqs disabled and tasklist_lock/pi_lock held. Yes, this is very
unlikely, but still this does not look good imho. Plus the out-of-tree
modules like systemtap can (more likely) hit this problem too.

And I don't think that "remove fifo ordering" is the right thing, see
the next change.

With this patch fput(file) checks the last queued work, if it is also
the ____fput() callback, it just adds this "file" to the list processed
by ____fput(). This adds the new ->f_next_put member into "struct file",
but hopefully it can share the memory with another member, see the next
patch. This way the exiting task will likely do task_work_add(____fput)
only once, so ->task_works shouldn't grow too much and we can probably
even remove cond_resched() in task_work_run().

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 fs/file_table.c    | 37 ++++++++++++++++++++++++++++++-------
 include/linux/fs.h |  1 +
 2 files changed, 31 insertions(+), 7 deletions(-)

diff --git a/fs/file_table.c b/fs/file_table.c
index ad17e05..8b91ef9 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -41,9 +41,12 @@ static struct kmem_cache *filp_cachep __read_mostly;
 
 static struct percpu_counter nr_files __cacheline_aligned_in_smp;
 
+#define rcuhead_to_file(head) \
+	container_of(head, struct file, f_u.fu_rcuhead)
+
 static void file_free_rcu(struct rcu_head *head)
 {
-	struct file *f = container_of(head, struct file, f_u.fu_rcuhead);
+	struct file *f = rcuhead_to_file(head);
 
 	put_cred(f->f_cred);
 	kmem_cache_free(filp_cachep, f);
@@ -239,11 +242,6 @@ static void delayed_fput(struct work_struct *unused)
 	}
 }
 
-static void ____fput(struct callback_head *work)
-{
-	__fput(container_of(work, struct file, f_u.fu_rcuhead));
-}
-
 /*
  * If kernel thread really needs to have the final fput() it has done
  * to complete, call this.  The only user right now is the boot - we
@@ -261,15 +259,40 @@ void flush_delayed_fput(void)
 
 static DECLARE_DELAYED_WORK(delayed_fput_work, delayed_fput);
 
+static void ____fput(struct callback_head *work)
+{
+	struct file *file = rcuhead_to_file(work);
+
+	do {
+		struct file *next = READ_ONCE(file->f_next_put);
+		__fput(file);
+		cond_resched();
+		file = next;
+
+	} while (file);
+}
+
 void fput(struct file *file)
 {
 	if (atomic_long_dec_and_test(&file->f_count)) {
 		struct task_struct *task = current;
 
 		if (likely(!in_interrupt() && !(task->flags & PF_KTHREAD))) {
+			struct callback_head *work = READ_ONCE(task->task_works);
+
+			/* avoid task_work_add() below if it is aready pending */
+			if (work && work->func == ____fput) {
+				struct file *prev = rcuhead_to_file(work);
+				file->f_next_put = prev->f_next_put;
+				prev->f_next_put = file;
+				return;
+			}
+
 			init_task_work(&file->f_u.fu_rcuhead, ____fput);
-			if (!task_work_add(task, &file->f_u.fu_rcuhead, true))
+			if (!task_work_add(task, &file->f_u.fu_rcuhead, true)) {
+				file->f_next_put = NULL;
 				return;
+			}
 			/*
 			 * After this task has run exit_task_work(),
 			 * task_work_add() will fail.  Fall through to delayed
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8514e65..3941b86 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -861,6 +861,7 @@ struct file {
 	struct file_ra_state	f_ra;
 
 	u64			f_version;
+	struct file		*f_next_put;
 #ifdef CONFIG_SECURITY
 	void			*f_security;
 #endif
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] fput: move ->f_next_put into a union with ->f_version
  2015-09-08 17:14 [PATCH 0/3] task_work: restore fifo ordering guarantee Oleg Nesterov
  2015-09-08 17:14 ` [PATCH 1/3] fput: don't abuse task_work_add() when possible Oleg Nesterov
@ 2015-09-08 17:14 ` Oleg Nesterov
  2015-09-08 17:14 ` [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee" Oleg Nesterov
  2 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2015-09-08 17:14 UTC (permalink / raw)
  To: Al Viro, Eric Dumazet, Linus Torvalds
  Cc: Andrew Morton, Ingo Molnar, Maciej Żenczykowski,
	Peter Zijlstra, Thomas Gleixner, linux-kernel

file->f_next_put can share the memory with file->f_version (I hope).

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 include/linux/fs.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3941b86..58faad6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -860,8 +860,10 @@ struct file {
 	const struct cred	*f_cred;
 	struct file_ra_state	f_ra;
 
-	u64			f_version;
-	struct file		*f_next_put;
+	union {
+		u64		f_version;
+		struct file	*f_next_put;
+	};
 #ifdef CONFIG_SECURITY
 	void			*f_security;
 #endif
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee"
  2015-09-08 17:14 [PATCH 0/3] task_work: restore fifo ordering guarantee Oleg Nesterov
  2015-09-08 17:14 ` [PATCH 1/3] fput: don't abuse task_work_add() when possible Oleg Nesterov
  2015-09-08 17:14 ` [PATCH 2/3] fput: move ->f_next_put into a union with ->f_version Oleg Nesterov
@ 2015-09-08 17:14 ` Oleg Nesterov
  2015-09-08 17:39   ` Linus Torvalds
  2 siblings, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2015-09-08 17:14 UTC (permalink / raw)
  To: Al Viro, Eric Dumazet, Linus Torvalds
  Cc: Andrew Morton, Ingo Molnar, Maciej Żenczykowski,
	Peter Zijlstra, Thomas Gleixner, linux-kernel

This reverts commit c82199061009d1561e31e17fca5e47a87cb7ff4c.

Now that fput() can't abuse ->task_works list, we can restore the FIFO
ordering. Yes, currently there are no in-kernel users which need this,
but I think task_work_add() will have more users and FIFO makes more
sense if (unlike fput/mntput) the callbacks change the task's state.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
 kernel/task_work.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/task_work.c b/kernel/task_work.c
index 53fa971..8727032 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -18,8 +18,6 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
  * This is like the signal handler which runs in kernel mode, but it doesn't
  * try to wake up the @task.
  *
- * Note: there is no ordering guarantee on works queued here.
- *
  * RETURNS:
  * 0 if succeeds or -ESRCH.
  */
@@ -110,6 +108,16 @@ void task_work_run(void)
 		raw_spin_unlock_wait(&task->pi_lock);
 		smp_mb();
 
+		/* Reverse the list to run the works in fifo order */
+		head = NULL;
+		do {
+			next = work->next;
+			work->next = head;
+			head = work;
+			work = next;
+		} while (work);
+
+		work = head;
 		do {
 			next = work->next;
 			work->func(work);
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee"
  2015-09-08 17:14 ` [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee" Oleg Nesterov
@ 2015-09-08 17:39   ` Linus Torvalds
  2015-09-08 17:41     ` Linus Torvalds
  2015-09-09 13:16     ` Oleg Nesterov
  0 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2015-09-08 17:39 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Eric Dumazet, Andrew Morton, Ingo Molnar,
	Maciej Żenczykowski, Peter Zijlstra, Thomas Gleixner,
	Linux Kernel Mailing List

On Tue, Sep 8, 2015 at 10:14 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Now that fput() can't abuse ->task_works list, we can restore the FIFO
> ordering. Yes, currently there are no in-kernel users which need this,
> but I think task_work_add() will have more users and FIFO makes more
> sense if (unlike fput/mntput) the callbacks change the task's state.

So quite frankly, regardless of the other patches, I'd almost rather
see the workqueue not being ordered. I don't think anybody pointed at
any code that could possibly care. And if nobody cares, why add the
code and the CPU cycles to do this?

The other patches I do like - why add those list operations that are
just guaranteed to mess up the cache to add the file descriptors to
the workqueue list, when we can just do the operation directly? So I
think that's a separate issue.

               Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee"
  2015-09-08 17:39   ` Linus Torvalds
@ 2015-09-08 17:41     ` Linus Torvalds
  2015-09-09 13:16     ` Oleg Nesterov
  1 sibling, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2015-09-08 17:41 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Eric Dumazet, Andrew Morton, Ingo Molnar,
	Maciej Żenczykowski, Peter Zijlstra, Thomas Gleixner,
	Linux Kernel Mailing List

On Tue, Sep 8, 2015 at 10:39 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The other patches I do like - why add those list operations that are
> just guaranteed to mess up the cache to add the file descriptors to
> the workqueue list, when we can just do the operation directly?

Side note, it's not just messing with the cache unnecessarily when the
lists grow long, the "cmpxchg" involved in adding the work is fairly
expensive. It may be an unlocked operation, but it's generally as
expensive as a locked one - just without the possibility of a
deadlock.

               Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee"
  2015-09-08 17:39   ` Linus Torvalds
  2015-09-08 17:41     ` Linus Torvalds
@ 2015-09-09 13:16     ` Oleg Nesterov
  2015-09-09 16:17       ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Oleg Nesterov @ 2015-09-09 13:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Eric Dumazet, Andrew Morton, Ingo Molnar,
	Maciej Żenczykowski, Peter Zijlstra, Thomas Gleixner,
	Linux Kernel Mailing List

sorry for delay,

On 09/08, Linus Torvalds wrote:
>
> On Tue, Sep 8, 2015 at 10:14 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Now that fput() can't abuse ->task_works list, we can restore the FIFO
> > ordering. Yes, currently there are no in-kernel users which need this,
> > but I think task_work_add() will have more users and FIFO makes more
> > sense if (unlike fput/mntput) the callbacks change the task's state.
>
> So quite frankly, regardless of the other patches, I'd almost rather
> see the workqueue not being ordered. I don't think anybody pointed at
> any code that could possibly care. And if nobody cares, why add the
> code and the CPU cycles to do this?

Currently nobody cares, yes. IIRC, even the out-of-tree code I know about,
although I didn't recheck.

Again, rightly or not I believe that FIFO makes task_work_add() more useful.
Perhaps I am wrong, so far I can only provide the artificial examples...

To me this does not differ from, say, stop_one_cpu_nowait(). I would be
surprised if it wasn't FIFO.

At least this should be cheap after 1/3. And in any case the time we spend
in the "reverse" loop is nothing compared to the next one which actually
runs the callbacks.

Thanks. Lets see what Al thinks...

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee"
  2015-09-09 13:16     ` Oleg Nesterov
@ 2015-09-09 16:17       ` Linus Torvalds
  2015-09-09 16:43         ` Oleg Nesterov
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2015-09-09 16:17 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Al Viro, Eric Dumazet, Andrew Morton, Ingo Molnar,
	Maciej Żenczykowski, Peter Zijlstra, Thomas Gleixner,
	Linux Kernel Mailing List

On Wed, Sep 9, 2015 at 6:16 AM, Oleg Nesterov <oleg@redhat.com> wrote:
>
> Again, rightly or not I believe that FIFO makes task_work_add() more useful.
> Perhaps I am wrong, so far I can only provide the artificial examples...

I'd rather wait until somebody has a real use case. I hate adding
infrastructure for "what if.." scenarios. We're better off if we can
make do with minimal semantics (ie "there are no guarantees except
that the work will be done before returning to user space") than with
stronger semantics that people then perhaps start depending on even if
they didn't really need them.

                  Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee"
  2015-09-09 16:17       ` Linus Torvalds
@ 2015-09-09 16:43         ` Oleg Nesterov
  0 siblings, 0 replies; 9+ messages in thread
From: Oleg Nesterov @ 2015-09-09 16:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Al Viro, Eric Dumazet, Andrew Morton, Ingo Molnar,
	Maciej Żenczykowski, Peter Zijlstra, Thomas Gleixner,
	Linux Kernel Mailing List

On 09/09, Linus Torvalds wrote:
>
> On Wed, Sep 9, 2015 at 6:16 AM, Oleg Nesterov <oleg@redhat.com> wrote:
> >
> > Again, rightly or not I believe that FIFO makes task_work_add() more useful.
> > Perhaps I am wrong, so far I can only provide the artificial examples...
>
> I'd rather wait until somebody has a real use case. I hate adding
> infrastructure for "what if.." scenarios. We're better off if we can
> make do with minimal semantics (ie "there are no guarantees except
> that the work will be done before returning to user space") than with
> stronger semantics that people then perhaps start depending on even if
> they didn't really need them.

OK, I see. Thanks.

At least you seem to agree with 1-2, so if Al takes these changes we
can easily reconsider 3/3 later, if/when we have the new user which
needs FIFO.

Oleg.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-09-09 16:46 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-08 17:14 [PATCH 0/3] task_work: restore fifo ordering guarantee Oleg Nesterov
2015-09-08 17:14 ` [PATCH 1/3] fput: don't abuse task_work_add() when possible Oleg Nesterov
2015-09-08 17:14 ` [PATCH 2/3] fput: move ->f_next_put into a union with ->f_version Oleg Nesterov
2015-09-08 17:14 ` [PATCH 3/3] Revert "task_work: remove fifo ordering guarantee" Oleg Nesterov
2015-09-08 17:39   ` Linus Torvalds
2015-09-08 17:41     ` Linus Torvalds
2015-09-09 13:16     ` Oleg Nesterov
2015-09-09 16:17       ` Linus Torvalds
2015-09-09 16:43         ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.