From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752292AbdANMd2 (ORCPT ); Sat, 14 Jan 2017 07:33:28 -0500 Received: from terminus.zytor.com ([198.137.202.10]:51126 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752276AbdANMdZ (ORCPT ); Sat, 14 Jan 2017 07:33:25 -0500 Date: Sat, 14 Jan 2017 04:32:39 -0800 From: tip-bot for Davidlohr Bueso Message-ID: Cc: oleg@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, mingo@kernel.org, akpm@linux-foundation.org, tglx@linutronix.de, dbueso@suse.de, peterz@infradead.org, dave@stgolabs.net, paulmck@linux.vnet.ibm.com, hpa@zytor.com Reply-To: tglx@linutronix.de, akpm@linux-foundation.org, mingo@kernel.org, torvalds@linux-foundation.org, oleg@redhat.com, linux-kernel@vger.kernel.org, paulmck@linux.vnet.ibm.com, hpa@zytor.com, peterz@infradead.org, dave@stgolabs.net, dbueso@suse.de In-Reply-To: <1484148146-14210-2-git-send-email-dave@stgolabs.net> References: <1484148146-14210-2-git-send-email-dave@stgolabs.net> To: linux-tip-commits@vger.kernel.org Subject: [tip:locking/core] sched/wait, RCU: Introduce rcuwait machinery Git-Commit-ID: 8f95c90ceb541a38ac16fec48c05142ef1450c25 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: 8f95c90ceb541a38ac16fec48c05142ef1450c25 Gitweb: http://git.kernel.org/tip/8f95c90ceb541a38ac16fec48c05142ef1450c25 Author: Davidlohr Bueso AuthorDate: Wed, 11 Jan 2017 07:22:25 -0800 Committer: Ingo Molnar CommitDate: Sat, 14 Jan 2017 11:14:33 +0100 sched/wait, RCU: Introduce rcuwait machinery rcuwait provides support for (single) RCU-safe task wait/wake functionality, with the caveat that it must not be called after exit_notify(), such that we avoid racing with rcu delayed_put_task_struct callbacks, task_struct being rcu unaware in this context -- for which we similarly have task_rcu_dereference() magic, but with different return semantics, which can conflict with the wakeup side. The interfaces are quite straightforward: rcuwait_wait_event() rcuwait_wake_up() More details are in the comments, but it's perhaps worth mentioning at least, that users must provide proper serialization when waiting on a condition, and avoid corrupting a concurrent waiter. Also care must be taken between the task and the condition for when calling the wakeup -- we cannot miss wakeups. When porting users, this is for example, a given when using waitqueues in that everything is done under the q->lock. As such, it can remove sources of non preemptable unbounded work for realtime. Signed-off-by: Davidlohr Bueso Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Oleg Nesterov Cc: Andrew Morton Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1484148146-14210-2-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar --- include/linux/rcuwait.h | 63 +++++++++++++++++++++++++++++++++++++++++++++++++ kernel/exit.c | 30 +++++++++++++++++++++++ 2 files changed, 93 insertions(+) diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h new file mode 100644 index 0000000..0e93d56 --- /dev/null +++ b/include/linux/rcuwait.h @@ -0,0 +1,63 @@ +#ifndef _LINUX_RCUWAIT_H_ +#define _LINUX_RCUWAIT_H_ + +#include + +/* + * rcuwait provides a way of blocking and waking up a single + * task in an rcu-safe manner; where it is forbidden to use + * after exit_notify(). task_struct is not properly rcu protected, + * unless dealing with rcu-aware lists, ie: find_task_by_*(). + * + * Alternatively we have task_rcu_dereference(), but the return + * semantics have different implications which would break the + * wakeup side. The only time @task is non-nil is when a user is + * blocked (or checking if it needs to) on a condition, and reset + * as soon as we know that the condition has succeeded and are + * awoken. + */ +struct rcuwait { + struct task_struct *task; +}; + +#define __RCUWAIT_INITIALIZER(name) \ + { .task = NULL, } + +static inline void rcuwait_init(struct rcuwait *w) +{ + w->task = NULL; +} + +extern void rcuwait_wake_up(struct rcuwait *w); + +/* + * The caller is responsible for locking around rcuwait_wait_event(), + * such that writes to @task are properly serialized. + */ +#define rcuwait_wait_event(w, condition) \ +({ \ + /* \ + * Complain if we are called after do_exit()/exit_notify(), \ + * as we cannot rely on the rcu critical region for the \ + * wakeup side. \ + */ \ + WARN_ON(current->exit_state); \ + \ + rcu_assign_pointer((w)->task, current); \ + for (;;) { \ + /* \ + * Implicit barrier (A) pairs with (B) in \ + * rcuwait_trywake(). \ + */ \ + set_current_state(TASK_UNINTERRUPTIBLE); \ + if (condition) \ + break; \ + \ + schedule(); \ + } \ + \ + WRITE_ONCE((w)->task, NULL); \ + __set_current_state(TASK_RUNNING); \ +}) + +#endif /* _LINUX_RCUWAIT_H_ */ diff --git a/kernel/exit.c b/kernel/exit.c index 27c6865..a9441da 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -55,6 +55,7 @@ #include #include #include +#include #include #include @@ -282,6 +283,35 @@ retry: return task; } +void rcuwait_wake_up(struct rcuwait *w) +{ + struct task_struct *task; + + rcu_read_lock(); + + /* + * Order condition vs @task, such that everything prior to the load + * of @task is visible. This is the condition as to why the user called + * rcuwait_trywake() in the first place. Pairs with set_current_state() + * barrier (A) in rcuwait_wait_event(). + * + * WAIT WAKE + * [S] tsk = current [S] cond = true + * MB (A) MB (B) + * [L] cond [L] tsk + */ + smp_rmb(); /* (B) */ + + /* + * Avoid using task_rcu_dereference() magic as long as we are careful, + * see comment in rcuwait_wait_event() regarding ->exit_state. + */ + task = rcu_dereference(w->task); + if (task) + wake_up_process(task); + rcu_read_unlock(); +} + struct task_struct *try_get_task_struct(struct task_struct **ptask) { struct task_struct *task;