LKML Archive on lore.kernel.org
 help / color / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	Russell King - ARM Linux admin <linux@armlinux.org.uk>,
	Peter Zijlstra <peterz@infradead.org>,
	Chris Metcalf <cmetcalf@ezchip.com>,
	Christoph Lameter <cl@linux.com>, Kirill Tkhai <tkhai@yandex.ru>,
	Mike Galbraith <efault@gmx.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>,
	Davidlohr Bueso <dave@stgolabs.net>
Subject: [PATCH 2/3] task: RCU protect tasks on the runqueue
Date: Mon, 02 Sep 2019 23:52:01 -0500
Message-ID: <878sr6t21a.fsf_-_@x220.int.ebiederm.org> (raw)
In-Reply-To: <87k1aqt23r.fsf_-_@x220.int.ebiederm.org> (Eric W. Biederman's message of "Mon, 02 Sep 2019 23:50:32 -0500")


In the ordinary case today the rcu grace period of a task comes when a
task is reaped, well after the task has left the runqueue.  This
change guarantees that the rcu grace period always happens after a
task has left the runqueue.  As this is something that usaually happens
today I do not expect any code correctness problems with this change.
At most I anticipate timing challenges.

The only code that will run later are the functions
perf_event_delayed_put, and trace-sched_process_free.  The function
perf_event_delayed_put in the final analysis is just a WARN_ON for
cases that I assume should never happen.  So I don't see any problem
with delaying it.

The function trace_sched_process_free is a trace point and thus user
space visible.   The strangest dependencies can happen but short
of the bizarre it appears to me that trace_sched_process_free is
getting a slightly more accurate picture of when a task struct
is free.  As it is now guaranteed that the process will not be
on the runqueue.

Resources for a process are freed in release_task or in __put_task_struct
when the reference count goes to 0.  Both of which are happening at
effectively the same time as before.  The rcu grace period is just
potentially happening a little bit later in the timeline.

In the common case of a process being reaped after it leaves the
runqueue everything will happen exactly as before.

In the case where a task self reaps we are pretty much guaranteed that
the rcu grace period is delayed.  So we should get quite a bit of
coverage in of this worst case for the change in a normal threaded
workload.  So I expect any issues to turn up quickly or not at all.

I have lightly tested this change and everything appears to work
fine.

Inspired-by: Linus Torvalds <torvalds@linux-foundation.org>
Inspired-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/fork.c       | 11 +++++++----
 kernel/sched/core.c |  7 ++++---
 2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 9f04741d5c70..7a74ade4e7d6 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -900,10 +900,13 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
 	if (orig->cpus_ptr == &orig->cpus_mask)
 		tsk->cpus_ptr = &tsk->cpus_mask;
 
-	/* One for the user space visible state that goes away when reaped. */
-	refcount_set(&tsk->rcu_users, 1);
-	/* One for the rcu users, and one for the scheduler */
-	refcount_set(&tsk->usage, 2);
+	/*
+	 * One for the user space visible state that goes away when reaped.
+	 * One for the scheduler.
+	 */
+	refcount_set(&tsk->rcu_users, 2);
+	/* One for the rcu users */
+	refcount_set(&tsk->usage, 1);
 #ifdef CONFIG_BLK_DEV_IO_TRACE
 	tsk->btrace_seq = 0;
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2b037f195473..802958407369 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3135,7 +3135,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 		/* Task is done with its stack. */
 		put_task_stack(prev);
 
-		put_task_struct(prev);
+		put_task_struct_rcu_user(prev);
 	}
 
 	tick_nohz_task_switch();
@@ -3857,7 +3857,7 @@ static void __sched notrace __schedule(bool preempt)
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
-		rq->curr = next;
+		rcu_assign_pointer(rq->curr, next);
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating
@@ -5863,7 +5863,8 @@ void init_idle(struct task_struct *idle, int cpu)
 	__set_task_cpu(idle, cpu);
 	rcu_read_unlock();
 
-	rq->curr = rq->idle = idle;
+	rq->idle = idle;
+	rcu_assign_pointer(rq->curr, idle);
 	idle->on_rq = TASK_ON_RQ_QUEUED;
 #ifdef CONFIG_SMP
 	idle->on_cpu = 1;
-- 
2.21.0.dirty


  parent reply index

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-30 14:08 [BUG] Use of probe_kernel_address() in task_rcu_dereference() without checking return value Russell King - ARM Linux admin
2019-08-30 15:24 ` Oleg Nesterov
2019-08-30 15:30 ` Linus Torvalds
2019-08-30 15:40   ` Russell King - ARM Linux admin
2019-08-30 15:43     ` Linus Torvalds
2019-08-30 15:41   ` Linus Torvalds
2019-08-30 16:09     ` Oleg Nesterov
2019-08-30 16:21       ` Linus Torvalds
2019-08-30 16:44         ` Oleg Nesterov
2019-08-30 16:58           ` Linus Torvalds
2019-08-30 19:36         ` Eric W. Biederman
2019-09-02 13:40           ` Oleg Nesterov
2019-09-02 13:53             ` Peter Zijlstra
2019-09-02 14:44               ` Oleg Nesterov
2019-09-02 16:20                 ` Peter Zijlstra
2019-09-02 17:04             ` Eric W. Biederman
2019-09-02 17:34               ` Linus Torvalds
2019-09-03  4:50                 ` [PATCH 0/3] task: Making tasks on the runqueue rcu protected Eric W. Biederman
2019-09-03  4:51                   ` [PATCH 1/3] task: Add a count of task rcu users Eric W. Biederman
2019-09-04 14:36                     ` Oleg Nesterov
2019-09-04 14:44                     ` Frederic Weisbecker
2019-09-04 15:32                       ` Oleg Nesterov
2019-09-04 16:33                         ` Frederic Weisbecker
2019-09-04 18:20                           ` Linus Torvalds
2019-09-05 14:59                             ` Frederic Weisbecker
2019-09-03  4:52                   ` Eric W. Biederman [this message]
2019-09-03  7:41                     ` [PATCH 2/3] task: RCU protect tasks on the runqueue Peter Zijlstra
2019-09-03  7:47                       ` Peter Zijlstra
2019-09-03 16:44                         ` Eric W. Biederman
2019-09-03 17:08                           ` Linus Torvalds
2019-09-03 18:13                             ` Eric W. Biederman
2019-09-03 19:18                               ` Linus Torvalds
2019-09-03 20:06                                 ` Peter Zijlstra
2019-09-03 21:32                                   ` Paul E. McKenney
2019-09-05 20:02                                     ` Eric W. Biederman
2019-09-05 20:55                                       ` Paul E. McKenney
2019-09-06  7:07                                       ` Peter Zijlstra
2019-09-09 12:22                                         ` Eric W. Biederman
2019-09-25  7:36                                           ` Peter Zijlstra
2019-09-27  8:10                                   ` [tip: sched/urgent] tasks, sched/core: RCUify the assignment of rq->curr tip-bot2 for Eric W. Biederman
2019-09-03 19:42                               ` [PATCH 2/3] task: RCU protect tasks on the runqueue Peter Zijlstra
2019-09-14 12:31                           ` [PATCH v2 1/4] task: Add a count of task rcu users Eric W. Biederman
2019-09-14 12:31                           ` [PATCH v2 2/4] task: Ensure tasks are available for a grace period after leaving the runqueue Eric W. Biederman
2019-09-14 12:32                           ` [PATCH v2 3/4] task: With a grace period after finish_task_switch, remove unnecessary code Eric W. Biederman
2019-09-04 14:22                     ` [PATCH 2/3] task: RCU protect tasks on the runqueue Frederic Weisbecker
2019-09-03  4:52                   ` [PATCH 3/3] task: Clean house now that tasks on the runqueue are rcu protected Eric W. Biederman
2019-09-03  9:45                     ` kbuild test robot
2019-09-03 13:06                     ` Oleg Nesterov
2019-09-03 13:58                   ` [PATCH 0/3] task: Making tasks on the runqueue " Oleg Nesterov
2019-09-03 15:44                   ` Linus Torvalds
2019-09-03 19:46                     ` Peter Zijlstra
     [not found]                   ` <87muf7f4bf.fsf_-_@x220.int.ebiederm.org>
2019-09-14 12:33                     ` [PATCH v2 1/4] task: Add a count of task rcu users Eric W. Biederman
2019-09-15 13:54                       ` Paul E. McKenney
2019-09-27  8:10                       ` [tip: sched/urgent] tasks: Add a count of task RCU users tip-bot2 for Eric W. Biederman
2019-09-14 12:33                     ` [PATCH v2 2/4] task: Ensure tasks are available for a grace period after leaving the runqueue Eric W. Biederman
2019-09-15 14:07                       ` Paul E. McKenney
2019-09-15 14:09                         ` Paul E. McKenney
2019-09-27  8:10                       ` [tip: sched/urgent] tasks, sched/core: " tip-bot2 for Eric W. Biederman
2019-09-14 12:34                     ` [PATCH v2 3/4] task: With a grace period after finish_task_switch, remove unnecessary code Eric W. Biederman
2019-09-15 14:32                       ` Paul E. McKenney
2019-09-15 17:07                         ` Linus Torvalds
2019-09-15 18:47                           ` Paul E. McKenney
2019-09-27  8:10                       ` [tip: sched/urgent] tasks, sched/core: With a grace period after finish_task_switch(), " tip-bot2 for Eric W. Biederman
2019-09-14 12:35                     ` [PATCH v2 4/4] task: RCUify the assignment of rq->curr Eric W. Biederman
2019-09-15 14:41                       ` Paul E. McKenney
2019-09-15 17:59                         ` Eric W. Biederman
2019-09-15 18:25                           ` Eric W. Biederman
2019-09-15 18:48                             ` Paul E. McKenney
2019-09-20 23:02                       ` Frederic Weisbecker
2019-09-26  1:49                         ` Eric W. Biederman
2019-09-26 12:42                           ` Frederic Weisbecker
2019-09-14 17:43                     ` [PATCH v2 0/4] task: Making tasks on the runqueue rcu protected Linus Torvalds
2019-09-17 17:38                       ` Eric W. Biederman
2019-09-25  7:51                         ` Peter Zijlstra
2019-09-26  1:11                           ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878sr6t21a.fsf_-_@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=cl@linux.com \
    --cc=cmetcalf@ezchip.com \
    --cc=dave@stgolabs.net \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tkhai@yandex.ru \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git