From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932519AbcFCKt2 (ORCPT ); Fri, 3 Jun 2016 06:49:28 -0400 Received: from terminus.zytor.com ([198.137.202.10]:56014 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932308AbcFCKt1 (ORCPT ); Fri, 3 Jun 2016 06:49:27 -0400 Date: Fri, 3 Jun 2016 03:48:32 -0700 From: tip-bot for Oleg Nesterov Message-ID: Cc: tglx@linutronix.de, cmetcalf@ezchip.com, linux-kernel@vger.kernel.org, oleg@redhat.com, efault@gmx.de, ktkhai@parallels.com, hpa@zytor.com, tkhai@yandex.ru, vdavydov@parallels.com, torvalds@linux-foundation.org, mingo@kernel.org, peterz@infradead.org, cl@linux.com Reply-To: torvalds@linux-foundation.org, tkhai@yandex.ru, vdavydov@parallels.com, peterz@infradead.org, cl@linux.com, mingo@kernel.org, cmetcalf@ezchip.com, linux-kernel@vger.kernel.org, tglx@linutronix.de, efault@gmx.de, ktkhai@parallels.com, hpa@zytor.com, oleg@redhat.com In-Reply-To: <20160518170218.GY3192@twins.programming.kicks-ass.net> References: <20160518170218.GY3192@twins.programming.kicks-ass.net> To: linux-tip-commits@vger.kernel.org Subject: [tip:sched/core] sched/api: Introduce task_rcu_dereference() and try_get_task_struct() Git-Commit-ID: 150593bf869393d10a79f6bd3df2585ecc20a9bb X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: 150593bf869393d10a79f6bd3df2585ecc20a9bb Gitweb: http://git.kernel.org/tip/150593bf869393d10a79f6bd3df2585ecc20a9bb Author: Oleg Nesterov AuthorDate: Wed, 18 May 2016 19:02:18 +0200 Committer: Ingo Molnar CommitDate: Fri, 3 Jun 2016 09:18:57 +0200 sched/api: Introduce task_rcu_dereference() and try_get_task_struct() Generally task_struct is only protected by RCU if it was found on a RCU protected list (say, for_each_process() or find_task_by_vpid()). As Kirill pointed out rq->curr isn't protected by RCU, the scheduler drops the (potentially) last reference without RCU gp, this means that we need to fix the code which uses foreign_rq->curr under rcu_read_lock(). Add a new helper which can be used to dereference rq->curr or any other pointer to task_struct assuming that it should be cleared or updated before the final put_task_struct(). It returns non-NULL only if this task can't go away before rcu_read_unlock(). ( Also add try_get_task_struct() to make it easier to use this API correctly. ) Suggested-by: Kirill Tkhai Signed-off-by: Oleg Nesterov [ Updated comments; added try_get_task_struct()] Signed-off-by: Peter Zijlstra (Intel) Cc: Chris Metcalf Cc: Christoph Lameter Cc: Kirill Tkhai Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Vladimir Davydov Link: http://lkml.kernel.org/r/20160518170218.GY3192@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar --- include/linux/sched.h | 3 ++ kernel/exit.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 79 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 6e42ada..dee41bf 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2139,6 +2139,9 @@ static inline void put_task_struct(struct task_struct *t) __put_task_struct(t); } +struct task_struct *task_rcu_dereference(struct task_struct **ptask); +struct task_struct *try_get_task_struct(struct task_struct **ptask); + #ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN extern void task_cputime(struct task_struct *t, cputime_t *utime, cputime_t *stime); diff --git a/kernel/exit.c b/kernel/exit.c index 9e6e135..2fb4d44 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -211,6 +211,82 @@ repeat: } /* + * Note that if this function returns a valid task_struct pointer (!NULL) + * task->usage must remain >0 for the duration of the RCU critical section. + */ +struct task_struct *task_rcu_dereference(struct task_struct **ptask) +{ + struct sighand_struct *sighand; + struct task_struct *task; + + /* + * We need to verify that release_task() was not called and thus + * delayed_put_task_struct() can't run and drop the last reference + * before rcu_read_unlock(). We check task->sighand != NULL, + * but we can read the already freed and reused memory. + */ +retry: + task = rcu_dereference(*ptask); + if (!task) + return NULL; + + probe_kernel_address(&task->sighand, sighand); + + /* + * Pairs with atomic_dec_and_test() in put_task_struct(). If this task + * was already freed we can not miss the preceding update of this + * pointer. + */ + smp_rmb(); + if (unlikely(task != READ_ONCE(*ptask))) + goto retry; + + /* + * We've re-checked that "task == *ptask", now we have two different + * cases: + * + * 1. This is actually the same task/task_struct. In this case + * sighand != NULL tells us it is still alive. + * + * 2. This is another task which got the same memory for task_struct. + * We can't know this of course, and we can not trust + * sighand != NULL. + * + * In this case we actually return a random value, but this is + * correct. + * + * If we return NULL - we can pretend that we actually noticed that + * *ptask was updated when the previous task has exited. Or pretend + * that probe_slab_address(&sighand) reads NULL. + * + * If we return the new task (because sighand is not NULL for any + * reason) - this is fine too. This (new) task can't go away before + * another gp pass. + * + * And note: We could even eliminate the false positive if re-read + * task->sighand once again to avoid the falsely NULL. But this case + * is very unlikely so we don't care. + */ + if (!sighand) + return NULL; + + return task; +} + +struct task_struct *try_get_task_struct(struct task_struct **ptask) +{ + struct task_struct *task; + + rcu_read_lock(); + task = task_rcu_dereference(ptask); + if (task) + get_task_struct(task); + rcu_read_unlock(); + + return task; +} + +/* * Determine if a process group is "orphaned", according to the POSIX * definition in 2.2.2.52. Orphaned process groups are not to be affected * by terminal-generated stop signals. Newly orphaned process groups are