From: "Paul E. McKenney" <paulmck@kernel.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Oleg Nesterov <oleg@redhat.com>,
Russell King - ARM Linux admin <linux@armlinux.org.uk>,
Chris Metcalf <cmetcalf@ezchip.com>,
Christoph Lameter <cl@linux.com>, Kirill Tkhai <tkhai@yandex.ru>,
Mike Galbraith <efault@gmx.de>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@kernel.org>,
Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
Davidlohr Bueso <dave@stgolabs.net>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH v2 2/4] task: Ensure tasks are available for a grace period after leaving the runqueue
Date: Sun, 15 Sep 2019 07:07:52 -0700 [thread overview]
Message-ID: <20190915140752.GJ30224@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <87r24jdpl5.fsf_-_@x220.int.ebiederm.org>
On Sat, Sep 14, 2019 at 07:33:58AM -0500, Eric W. Biederman wrote:
>
> In the ordinary case today the rcu grace period for a task_struct is
> triggered when another process wait's for it's zombine and causes the
> kernel to call release_task(). As the waiting task has to receive a
> signal and then act upon it before this happens, typically this will
> occur after the original task as been removed from the runqueue.
>
> Unfortunaty in some cases such as self reaping tasks it can be shown
> that release_task() will be called starting the grace period for
> task_struct long before the task leaves the runqueue.
>
> Therefore use put_task_struct_rcu_user in finish_task_switch to
> guarantee that the there is a rcu lifetime after the task
> leaves the runqueue.
>
> Besides the change in the start of the rcu grace period for the
> task_struct this change may cause perf_event_delayed_put and
> trace_sched_process_free. The function perf_event_delayed_put boils
> down to just a WARN_ON for cases that I assume never show happen. So
> I don't see any problem with delaying it.
>
> The function trace_sched_process_free is a trace point and thus
> visible to user space. Occassionally userspace has the strangest
> dependencies so this has a miniscule chance of causing a regression.
> This change only changes the timing of when the tracepoint is called.
> The change in timing arguably gives userspace a more accurate picture
> of what is going on. So I don't expect there to be a regression.
>
> In the case where a task self reaps we are pretty much guaranteed that
> the rcu grace period is delayed. So we should get quite a bit of
> coverage in of this worst case for the change in a normal threaded
> workload. So I expect any issues to turn up quickly or not at all.
>
> I have lightly tested this change and everything appears to work
> fine.
>
> Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
> Inspired-by: Oleg Nesterov <oleg@redhat.com>
> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> ---
> kernel/fork.c | 11 +++++++----
> kernel/sched/core.c | 2 +-
> 2 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 9f04741d5c70..7a74ade4e7d6 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -900,10 +900,13 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
> if (orig->cpus_ptr == &orig->cpus_mask)
> tsk->cpus_ptr = &tsk->cpus_mask;
>
> - /* One for the user space visible state that goes away when reaped. */
> - refcount_set(&tsk->rcu_users, 1);
> - /* One for the rcu users, and one for the scheduler */
> - refcount_set(&tsk->usage, 2);
> + /*
> + * One for the user space visible state that goes away when reaped.
> + * One for the scheduler.
> + */
> + refcount_set(&tsk->rcu_users, 2);
OK, this would allow us to add a later decrement-and-test of
->rcu_users ...
> + /* One for the rcu users */
> + refcount_set(&tsk->usage, 1);
> #ifdef CONFIG_BLK_DEV_IO_TRACE
> tsk->btrace_seq = 0;
> #endif
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2b037f195473..69015b7c28da 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3135,7 +3135,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> /* Task is done with its stack. */
> put_task_stack(prev);
>
> - put_task_struct(prev);
> + put_task_struct_rcu_user(prev);
... which is here. And this looks to be invoked from the __schedule()
called from do_task_dead() at the very end of do_exit().
This looks plausible, but still requires that it no longer be possible to
enter an RCU read-side critical section that might increment ->rcu_users
after this point in time. This might be enforced by a grace period
between the time that the task was removed from its lists and the current
time (seems unlikely, though, in that case why bother with call_rcu()?) or
by some other synchronization.
On to the next patch!
Thanx, Paul
> }
>
> tick_nohz_task_switch();
> --
> 2.21.0.dirty
>
next prev parent reply other threads:[~2019-09-15 14:07 UTC|newest]
Thread overview: 75+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-30 14:08 [BUG] Use of probe_kernel_address() in task_rcu_dereference() without checking return value Russell King - ARM Linux admin
2019-08-30 15:24 ` Oleg Nesterov
2019-08-30 15:30 ` Linus Torvalds
2019-08-30 15:40 ` Russell King - ARM Linux admin
2019-08-30 15:43 ` Linus Torvalds
2019-08-30 15:41 ` Linus Torvalds
2019-08-30 16:09 ` Oleg Nesterov
2019-08-30 16:21 ` Linus Torvalds
2019-08-30 16:44 ` Oleg Nesterov
2019-08-30 16:58 ` Linus Torvalds
2019-08-30 19:36 ` Eric W. Biederman
2019-09-02 13:40 ` Oleg Nesterov
2019-09-02 13:53 ` Peter Zijlstra
2019-09-02 14:44 ` Oleg Nesterov
2019-09-02 16:20 ` Peter Zijlstra
2019-09-02 17:04 ` Eric W. Biederman
2019-09-02 17:34 ` Linus Torvalds
2019-09-03 4:50 ` [PATCH 0/3] task: Making tasks on the runqueue rcu protected Eric W. Biederman
2019-09-03 4:51 ` [PATCH 1/3] task: Add a count of task rcu users Eric W. Biederman
2019-09-04 14:36 ` Oleg Nesterov
2019-09-04 14:44 ` Frederic Weisbecker
2019-09-04 15:32 ` Oleg Nesterov
2019-09-04 16:33 ` Frederic Weisbecker
2019-09-04 18:20 ` Linus Torvalds
2019-09-05 14:59 ` Frederic Weisbecker
2019-09-03 4:52 ` [PATCH 2/3] task: RCU protect tasks on the runqueue Eric W. Biederman
2019-09-03 7:41 ` Peter Zijlstra
2019-09-03 7:47 ` Peter Zijlstra
2019-09-03 16:44 ` Eric W. Biederman
2019-09-03 17:08 ` Linus Torvalds
2019-09-03 18:13 ` Eric W. Biederman
2019-09-03 19:18 ` Linus Torvalds
2019-09-03 20:06 ` Peter Zijlstra
2019-09-03 21:32 ` Paul E. McKenney
2019-09-05 20:02 ` Eric W. Biederman
2019-09-05 20:55 ` Paul E. McKenney
2019-09-06 7:07 ` Peter Zijlstra
2019-09-09 12:22 ` Eric W. Biederman
2019-09-25 7:36 ` Peter Zijlstra
2019-09-27 8:10 ` [tip: sched/urgent] tasks, sched/core: RCUify the assignment of rq->curr tip-bot2 for Eric W. Biederman
2019-09-03 19:42 ` [PATCH 2/3] task: RCU protect tasks on the runqueue Peter Zijlstra
2019-09-14 12:31 ` [PATCH v2 1/4] task: Add a count of task rcu users Eric W. Biederman
2019-09-14 12:31 ` [PATCH v2 2/4] task: Ensure tasks are available for a grace period after leaving the runqueue Eric W. Biederman
2019-09-14 12:32 ` [PATCH v2 3/4] task: With a grace period after finish_task_switch, remove unnecessary code Eric W. Biederman
2019-09-04 14:22 ` [PATCH 2/3] task: RCU protect tasks on the runqueue Frederic Weisbecker
2019-09-03 4:52 ` [PATCH 3/3] task: Clean house now that tasks on the runqueue are rcu protected Eric W. Biederman
2019-09-03 9:45 ` kbuild test robot
2019-09-03 13:06 ` Oleg Nesterov
2019-09-03 13:58 ` [PATCH 0/3] task: Making tasks on the runqueue " Oleg Nesterov
2019-09-03 15:44 ` Linus Torvalds
2019-09-03 19:46 ` Peter Zijlstra
[not found] ` <87muf7f4bf.fsf_-_@x220.int.ebiederm.org>
2019-09-14 12:33 ` [PATCH v2 1/4] task: Add a count of task rcu users Eric W. Biederman
2019-09-15 13:54 ` Paul E. McKenney
2019-09-27 8:10 ` [tip: sched/urgent] tasks: Add a count of task RCU users tip-bot2 for Eric W. Biederman
2019-09-14 12:33 ` [PATCH v2 2/4] task: Ensure tasks are available for a grace period after leaving the runqueue Eric W. Biederman
2019-09-15 14:07 ` Paul E. McKenney [this message]
2019-09-15 14:09 ` Paul E. McKenney
2019-09-27 8:10 ` [tip: sched/urgent] tasks, sched/core: " tip-bot2 for Eric W. Biederman
2019-09-14 12:34 ` [PATCH v2 3/4] task: With a grace period after finish_task_switch, remove unnecessary code Eric W. Biederman
2019-09-15 14:32 ` Paul E. McKenney
2019-09-15 17:07 ` Linus Torvalds
2019-09-15 18:47 ` Paul E. McKenney
2019-09-27 8:10 ` [tip: sched/urgent] tasks, sched/core: With a grace period after finish_task_switch(), " tip-bot2 for Eric W. Biederman
2019-09-14 12:35 ` [PATCH v2 4/4] task: RCUify the assignment of rq->curr Eric W. Biederman
2019-09-15 14:41 ` Paul E. McKenney
2019-09-15 17:59 ` Eric W. Biederman
2019-09-15 18:25 ` Eric W. Biederman
2019-09-15 18:48 ` Paul E. McKenney
2019-09-20 23:02 ` Frederic Weisbecker
2019-09-26 1:49 ` Eric W. Biederman
2019-09-26 12:42 ` Frederic Weisbecker
2019-09-14 17:43 ` [PATCH v2 0/4] task: Making tasks on the runqueue rcu protected Linus Torvalds
2019-09-17 17:38 ` Eric W. Biederman
2019-09-25 7:51 ` Peter Zijlstra
2019-09-26 1:11 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190915140752.GJ30224@paulmck-ThinkPad-P72 \
--to=paulmck@kernel.org \
--cc=cl@linux.com \
--cc=cmetcalf@ezchip.com \
--cc=dave@stgolabs.net \
--cc=ebiederm@xmission.com \
--cc=efault@gmx.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@armlinux.org.uk \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tkhai@yandex.ru \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).