linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "tip-bot2 for Eric W. Biederman" <tip-bot2@linutronix.de>
To: linux-tip-commits@vger.kernel.org
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Chris Metcalf <cmetcalf@ezchip.com>,
	Christoph Lameter <cl@linux.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Kirill Tkhai <tkhai@yandex.ru>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Mike Galbraith <efault@gmx.de>, Oleg Nesterov <oleg@redhat.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	"Russell King - ARM Linux admin" <linux@armlinux.org.uk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@alien8.de>,
	linux-kernel@vger.kernel.org
Subject: [tip: sched/urgent] tasks, sched/core: RCUify the assignment of rq->curr
Date: Fri, 27 Sep 2019 08:10:44 -0000	[thread overview]
Message-ID: <156957184417.9866.5061690782234958008.tip-bot2@tip-bot2> (raw)
In-Reply-To: <20190903200603.GW2349@hirez.programming.kicks-ass.net>

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     5311a98fef7d0dc2e8040ae0e18f5568d6d1dd5a
Gitweb:        https://git.kernel.org/tip/5311a98fef7d0dc2e8040ae0e18f5568d6d1dd5a
Author:        Eric W. Biederman <ebiederm@xmission.com>
AuthorDate:    Sat, 14 Sep 2019 07:35:02 -05:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Wed, 25 Sep 2019 17:42:29 +02:00

tasks, sched/core: RCUify the assignment of rq->curr

The current task on the runqueue is currently read with rcu_dereference().

To obtain ordinary RCU semantics for an rcu_dereference() of rq->curr it needs
to be paired with rcu_assign_pointer() of rq->curr.  Which provides the
memory barrier necessary to order assignments to the task_struct
and the assignment to rq->curr.

Unfortunately the assignment of rq->curr in __schedule is a hot path,
and it has already been show that additional barriers in that code
will reduce the performance of the scheduler.  So I will attempt to
describe below why you can effectively have ordinary RCU semantics
without any additional barriers.

The assignment of rq->curr in init_idle is a slow path called once
per cpu and that can use rcu_assign_pointer() without any concerns.

As I write this there are effectively two users of rcu_dereference() on
rq->curr.  There is the membarrier code in kernel/sched/membarrier.c
that only looks at "->mm" after the rcu_dereference().  Then there is
task_numa_compare() in kernel/sched/fair.c.  My best reading of the
code shows that task_numa_compare only access: "->flags",
"->cpus_ptr", "->numa_group", "->numa_faults[]",
"->total_numa_faults", and "->se.cfs_rq".

The code in __schedule() essentially does:
	rq_lock(...);
	smp_mb__after_spinlock();

	next = pick_next_task(...);
	rq->curr = next;

	context_switch(prev, next);

At the start of the function the rq_lock/smp_mb__after_spinlock
pair provides a full memory barrier.  Further there is a full memory barrier
in context_switch().

This means that any task that has already run and modified itself (the
common case) has already seen two memory barriers before __schedule()
runs and begins executing.  A task that modifies itself then sees a
third full memory barrier pair with the rq_lock();

For a brand new task that is enqueued with wake_up_new_task() there
are the memory barriers present from the taking and release the
pi_lock and the rq_lock as the processes is enqueued as well as the
full memory barrier at the start of __schedule() assuming __schedule()
happens on the same cpu.

This means that by the time we reach the assignment of rq->curr
except for values on the task struct modified in pick_next_task
the code has the same guarantees as if it used rcu_assign_pointer().

Reading through all of the implementations of pick_next_task it
appears pick_next_task is limited to modifying the task_struct fields
"->se", "->rt", "->dl".  These fields are the sched_entity structures
of the varies schedulers.

Further "->se.cfs_rq" is only changed in cgroup attach/move operations
initialized by userspace.

Unless I have missed something this means that in practice that the
users of "rcu_dereference(rq->curr)" get normal RCU semantics of
rcu_dereference() for the fields the care about, despite the
assignment of rq->curr in __schedule() ot using rcu_assign_pointer.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Kirill Tkhai <tkhai@yandex.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King - ARM Linux admin <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20190903200603.GW2349@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c |  9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5e5fefb..84c7116 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4033,7 +4033,11 @@ static void __sched notrace __schedule(bool preempt)
 
 	if (likely(prev != next)) {
 		rq->nr_switches++;
-		rq->curr = next;
+		/*
+		 * RCU users of rcu_dereference(rq->curr) may not see
+		 * changes to task_struct made by pick_next_task().
+		 */
+		RCU_INIT_POINTER(rq->curr, next);
 		/*
 		 * The membarrier system call requires each architecture
 		 * to have a full memory barrier after updating
@@ -6060,7 +6064,8 @@ void init_idle(struct task_struct *idle, int cpu)
 	__set_task_cpu(idle, cpu);
 	rcu_read_unlock();
 
-	rq->curr = rq->idle = idle;
+	rq->idle = idle;
+	rcu_assign_pointer(rq->curr, idle);
 	idle->on_rq = TASK_ON_RQ_QUEUED;
 #ifdef CONFIG_SMP
 	idle->on_cpu = 1;

  parent reply	other threads:[~2019-09-27  8:11 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-30 14:08 [BUG] Use of probe_kernel_address() in task_rcu_dereference() without checking return value Russell King - ARM Linux admin
2019-08-30 15:24 ` Oleg Nesterov
2019-08-30 15:30 ` Linus Torvalds
2019-08-30 15:40   ` Russell King - ARM Linux admin
2019-08-30 15:43     ` Linus Torvalds
2019-08-30 15:41   ` Linus Torvalds
2019-08-30 16:09     ` Oleg Nesterov
2019-08-30 16:21       ` Linus Torvalds
2019-08-30 16:44         ` Oleg Nesterov
2019-08-30 16:58           ` Linus Torvalds
2019-08-30 19:36         ` Eric W. Biederman
2019-09-02 13:40           ` Oleg Nesterov
2019-09-02 13:53             ` Peter Zijlstra
2019-09-02 14:44               ` Oleg Nesterov
2019-09-02 16:20                 ` Peter Zijlstra
2019-09-02 17:04             ` Eric W. Biederman
2019-09-02 17:34               ` Linus Torvalds
2019-09-03  4:50                 ` [PATCH 0/3] task: Making tasks on the runqueue rcu protected Eric W. Biederman
2019-09-03  4:51                   ` [PATCH 1/3] task: Add a count of task rcu users Eric W. Biederman
2019-09-04 14:36                     ` Oleg Nesterov
2019-09-04 14:44                     ` Frederic Weisbecker
2019-09-04 15:32                       ` Oleg Nesterov
2019-09-04 16:33                         ` Frederic Weisbecker
2019-09-04 18:20                           ` Linus Torvalds
2019-09-05 14:59                             ` Frederic Weisbecker
2019-09-03  4:52                   ` [PATCH 2/3] task: RCU protect tasks on the runqueue Eric W. Biederman
2019-09-03  7:41                     ` Peter Zijlstra
2019-09-03  7:47                       ` Peter Zijlstra
2019-09-03 16:44                         ` Eric W. Biederman
2019-09-03 17:08                           ` Linus Torvalds
2019-09-03 18:13                             ` Eric W. Biederman
2019-09-03 19:18                               ` Linus Torvalds
2019-09-03 20:06                                 ` Peter Zijlstra
2019-09-03 21:32                                   ` Paul E. McKenney
2019-09-05 20:02                                     ` Eric W. Biederman
2019-09-05 20:55                                       ` Paul E. McKenney
2019-09-06  7:07                                       ` Peter Zijlstra
2019-09-09 12:22                                         ` Eric W. Biederman
2019-09-25  7:36                                           ` Peter Zijlstra
2019-09-27  8:10                                   ` tip-bot2 for Eric W. Biederman [this message]
2019-09-03 19:42                               ` Peter Zijlstra
2019-09-14 12:31                           ` [PATCH v2 1/4] task: Add a count of task rcu users Eric W. Biederman
2019-09-14 12:31                           ` [PATCH v2 2/4] task: Ensure tasks are available for a grace period after leaving the runqueue Eric W. Biederman
2019-09-14 12:32                           ` [PATCH v2 3/4] task: With a grace period after finish_task_switch, remove unnecessary code Eric W. Biederman
2019-09-04 14:22                     ` [PATCH 2/3] task: RCU protect tasks on the runqueue Frederic Weisbecker
2019-09-03  4:52                   ` [PATCH 3/3] task: Clean house now that tasks on the runqueue are rcu protected Eric W. Biederman
2019-09-03  9:45                     ` kbuild test robot
2019-09-03 13:06                     ` Oleg Nesterov
2019-09-03 13:58                   ` [PATCH 0/3] task: Making tasks on the runqueue " Oleg Nesterov
2019-09-03 15:44                   ` Linus Torvalds
2019-09-03 19:46                     ` Peter Zijlstra
     [not found]                   ` <87muf7f4bf.fsf_-_@x220.int.ebiederm.org>
2019-09-14 12:33                     ` [PATCH v2 1/4] task: Add a count of task rcu users Eric W. Biederman
2019-09-15 13:54                       ` Paul E. McKenney
2019-09-27  8:10                       ` [tip: sched/urgent] tasks: Add a count of task RCU users tip-bot2 for Eric W. Biederman
2019-09-14 12:33                     ` [PATCH v2 2/4] task: Ensure tasks are available for a grace period after leaving the runqueue Eric W. Biederman
2019-09-15 14:07                       ` Paul E. McKenney
2019-09-15 14:09                         ` Paul E. McKenney
2019-09-27  8:10                       ` [tip: sched/urgent] tasks, sched/core: " tip-bot2 for Eric W. Biederman
2019-09-14 12:34                     ` [PATCH v2 3/4] task: With a grace period after finish_task_switch, remove unnecessary code Eric W. Biederman
2019-09-15 14:32                       ` Paul E. McKenney
2019-09-15 17:07                         ` Linus Torvalds
2019-09-15 18:47                           ` Paul E. McKenney
2019-09-27  8:10                       ` [tip: sched/urgent] tasks, sched/core: With a grace period after finish_task_switch(), " tip-bot2 for Eric W. Biederman
2019-09-14 12:35                     ` [PATCH v2 4/4] task: RCUify the assignment of rq->curr Eric W. Biederman
2019-09-15 14:41                       ` Paul E. McKenney
2019-09-15 17:59                         ` Eric W. Biederman
2019-09-15 18:25                           ` Eric W. Biederman
2019-09-15 18:48                             ` Paul E. McKenney
2019-09-20 23:02                       ` Frederic Weisbecker
2019-09-26  1:49                         ` Eric W. Biederman
2019-09-26 12:42                           ` Frederic Weisbecker
2019-09-14 17:43                     ` [PATCH v2 0/4] task: Making tasks on the runqueue rcu protected Linus Torvalds
2019-09-17 17:38                       ` Eric W. Biederman
2019-09-25  7:51                         ` Peter Zijlstra
2019-09-26  1:11                           ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=156957184417.9866.5061690782234958008.tip-bot2@tip-bot2 \
    --to=tip-bot2@linutronix.de \
    --cc=bp@alien8.de \
    --cc=cl@linux.com \
    --cc=cmetcalf@ezchip.com \
    --cc=dave@stgolabs.net \
    --cc=ebiederm@xmission.com \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tkhai@yandex.ru \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).