LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/2] sched: Introduce rcuwait
@ 2017-01-11 15:22 Davidlohr Bueso
  2017-01-11 15:22 ` [PATCH 1/2] sched: Introduce rcuwait machinery Davidlohr Bueso
  2017-01-11 15:22 ` [PATCH 2/2] locking/percpu-rwsem: Replace waitqueue with rcuwait Davidlohr Bueso
  0 siblings, 2 replies; 11+ messages in thread
From: Davidlohr Bueso @ 2017-01-11 15:22 UTC (permalink / raw)
  To: mingo, peterz; +Cc: oleg, dave, linux-kernel

Changes from v1:
 - Renamed trywake to wake_up.
 - Added Oleg's review tags.

Hi,

Here's an updated version of the pcpu rwsem writer wait/wake changes
with the abstractions wanted by Oleg. Patch 1 adds rcuwait (for a lack
of better name), and patch 2 trivially makes use of it. 

Has survived torture testing, which is actually very handy in this case
particularly dealing with equal amount of reader and writer threads.

Applies on top of Linus' tree (4.10-rc3).

Thanks.

Davidlohr Bueso (2):
  sched: Introduce rcuwait machinery
  locking/percpu-rwsem: Replace waitqueue with rcuwait

 include/linux/percpu-rwsem.h  |  8 +++---
 include/linux/rcuwait.h       | 63 +++++++++++++++++++++++++++++++++++++++++++
 kernel/exit.c                 | 30 +++++++++++++++++++++
 kernel/locking/percpu-rwsem.c |  7 +++--
 4 files changed, 100 insertions(+), 8 deletions(-)
 create mode 100644 include/linux/rcuwait.h

-- 
2.6.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] sched: Introduce rcuwait machinery
  2017-01-11 15:22 [PATCH v2 0/2] sched: Introduce rcuwait Davidlohr Bueso
@ 2017-01-11 15:22 ` Davidlohr Bueso
  2017-01-14 12:32   ` [tip:locking/core] sched/wait, RCU: " tip-bot for Davidlohr Bueso
  2017-01-11 15:22 ` [PATCH 2/2] locking/percpu-rwsem: Replace waitqueue with rcuwait Davidlohr Bueso
  1 sibling, 1 reply; 11+ messages in thread
From: Davidlohr Bueso @ 2017-01-11 15:22 UTC (permalink / raw)
  To: mingo, peterz; +Cc: oleg, dave, linux-kernel, Davidlohr Bueso

rcuwait provides support for (single) rcu-safe task wait/wake functionality,
with the caveat that it must not be called after exit_notify(), such that
we avoid racing with rcu delayed_put_task_struct callbacks, task_struct
being rcu unaware in this context -- for which we similarly have
task_rcu_dereference() magic, but with different return semantics, which
can conflict with the wakeup side.

The interfaces are quite straightforward:

rcuwait_wait_event()
rcuwait_wake_up()

More details are in the comments, but it's perhaps worth mentioning at least,
that users must provide proper serialization when waiting on a condition, and
avoid corrupting a concurrent waiter. Also care must be taken between the task
and the condition for when calling the wakeup -- we cannot miss wakeups. When
porting users, this is for example, a given when using waitqueues in that
everything is done under the q->lock. As such, it can remove sources of non
preemptable unbounded work for realtime.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/rcuwait.h | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/exit.c           | 30 +++++++++++++++++++++++
 2 files changed, 93 insertions(+)
 create mode 100644 include/linux/rcuwait.h

diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
new file mode 100644
index 000000000000..0e93d56c7ab2
--- /dev/null
+++ b/include/linux/rcuwait.h
@@ -0,0 +1,63 @@
+#ifndef _LINUX_RCUWAIT_H_
+#define _LINUX_RCUWAIT_H_
+
+#include <linux/rcupdate.h>
+
+/*
+ * rcuwait provides a way of blocking and waking up a single
+ * task in an rcu-safe manner; where it is forbidden to use
+ * after exit_notify(). task_struct is not properly rcu protected,
+ * unless dealing with rcu-aware lists, ie: find_task_by_*().
+ *
+ * Alternatively we have task_rcu_dereference(), but the return
+ * semantics have different implications which would break the
+ * wakeup side. The only time @task is non-nil is when a user is
+ * blocked (or checking if it needs to) on a condition, and reset
+ * as soon as we know that the condition has succeeded and are
+ * awoken.
+ */
+struct rcuwait {
+	struct task_struct *task;
+};
+
+#define __RCUWAIT_INITIALIZER(name)		\
+	{ .task = NULL, }
+
+static inline void rcuwait_init(struct rcuwait *w)
+{
+	w->task = NULL;
+}
+
+extern void rcuwait_wake_up(struct rcuwait *w);
+
+/*
+ * The caller is responsible for locking around rcuwait_wait_event(),
+ * such that writes to @task are properly serialized.
+ */
+#define rcuwait_wait_event(w, condition)				\
+({									\
+	/*								\
+	 * Complain if we are called after do_exit()/exit_notify(),     \
+	 * as we cannot rely on the rcu critical region for the		\
+	 * wakeup side.							\
+	 */                                                             \
+	WARN_ON(current->exit_state);                                   \
+									\
+	rcu_assign_pointer((w)->task, current);				\
+	for (;;) {							\
+		/*							\
+		 * Implicit barrier (A) pairs with (B) in		\
+		 * rcuwait_trywake().					\
+		 */							\
+		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		if (condition)						\
+			break;						\
+									\
+		schedule();						\
+	}								\
+									\
+	WRITE_ONCE((w)->task, NULL);					\
+	__set_current_state(TASK_RUNNING);				\
+})
+
+#endif /* _LINUX_RCUWAIT_H_ */
diff --git a/kernel/exit.c b/kernel/exit.c
index 8f14b866f9f6..3e0aa08bdf4e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -55,6 +55,7 @@
 #include <linux/shm.h>
 #include <linux/kcov.h>
 #include <linux/random.h>
+#include <linux/rcuwait.h>
 
 #include <linux/uaccess.h>
 #include <asm/unistd.h>
@@ -282,6 +283,35 @@ struct task_struct *task_rcu_dereference(struct task_struct **ptask)
 	return task;
 }
 
+void rcuwait_wake_up(struct rcuwait *w)
+{
+	struct task_struct *task;
+
+	rcu_read_lock();
+
+	/*
+	 * Order condition vs @task, such that everything prior to the load
+	 * of @task is visible. This is the condition as to why the user called
+	 * rcuwait_trywake() in the first place. Pairs with set_current_state()
+	 * barrier (A) in rcuwait_wait_event().
+	 *
+	 *    WAIT                WAKE
+	 *    [S] tsk = current	  [S] cond = true
+	 *        MB (A)	      MB (B)
+	 *    [L] cond		  [L] tsk
+	 */
+	smp_rmb(); /* (B) */
+
+	/*
+	 * Avoid using task_rcu_dereference() magic as long as we are careful,
+	 * see comment in rcuwait_wait_event() regarding ->exit_state.
+	 */
+	task = rcu_dereference(w->task);
+	if (task)
+		wake_up_process(task);
+	rcu_read_unlock();
+}
+
 struct task_struct *try_get_task_struct(struct task_struct **ptask)
 {
 	struct task_struct *task;
-- 
2.6.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] locking/percpu-rwsem: Replace waitqueue with rcuwait
  2017-01-11 15:22 [PATCH v2 0/2] sched: Introduce rcuwait Davidlohr Bueso
  2017-01-11 15:22 ` [PATCH 1/2] sched: Introduce rcuwait machinery Davidlohr Bueso
@ 2017-01-11 15:22 ` Davidlohr Bueso
  2017-01-14 12:33   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
  1 sibling, 1 reply; 11+ messages in thread
From: Davidlohr Bueso @ 2017-01-11 15:22 UTC (permalink / raw)
  To: mingo, peterz; +Cc: oleg, dave, linux-kernel, Davidlohr Bueso

The use of any kind of wait queue is an overkill for pcpu-rwsems.
While one option would be to use the less heavy simple (swait)
flavor, this is still too much for what pcpu-rwsems needs. For one,
we do not care about any sort of queuing in that the only (rare) time
writers (and readers, for that matter) are queued is when trying to
acquire the regular contended rw_sem. There cannot be any further
queuing as writers are serialized by the rw_sem in the first place.

Given that percpu_down_write() must not be called after exit_notify(),
we can replace the bulky waitqueue with rcuwait such that a writer
can wait for its turn to take the lock. As such, we can avoid the
queue handling and locking overhead.

Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/percpu-rwsem.h  | 8 ++++----
 kernel/locking/percpu-rwsem.c | 7 +++----
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 5b2e6159b744..93664f022ecf 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -4,15 +4,15 @@
 #include <linux/atomic.h>
 #include <linux/rwsem.h>
 #include <linux/percpu.h>
-#include <linux/wait.h>
+#include <linux/rcuwait.h>
 #include <linux/rcu_sync.h>
 #include <linux/lockdep.h>
 
 struct percpu_rw_semaphore {
 	struct rcu_sync		rss;
 	unsigned int __percpu	*read_count;
-	struct rw_semaphore	rw_sem;
-	wait_queue_head_t	writer;
+	struct rw_semaphore	rw_sem; /* slowpath */
+	struct rcuwait          writer; /* blocked writer */
 	int			readers_block;
 };
 
@@ -22,7 +22,7 @@ static struct percpu_rw_semaphore name = {				\
 	.rss = __RCU_SYNC_INITIALIZER(name.rss, RCU_SCHED_SYNC),	\
 	.read_count = &__percpu_rwsem_rc_##name,			\
 	.rw_sem = __RWSEM_INITIALIZER(name.rw_sem),			\
-	.writer = __WAIT_QUEUE_HEAD_INITIALIZER(name.writer),		\
+	.writer = __RCUWAIT_INITIALIZER(name.writer),			\
 }
 
 extern int __percpu_down_read(struct percpu_rw_semaphore *, int);
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index ce182599cf2e..883cf1b92d90 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -1,7 +1,6 @@
 #include <linux/atomic.h>
 #include <linux/rwsem.h>
 #include <linux/percpu.h>
-#include <linux/wait.h>
 #include <linux/lockdep.h>
 #include <linux/percpu-rwsem.h>
 #include <linux/rcupdate.h>
@@ -18,7 +17,7 @@ int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
 	/* ->rw_sem represents the whole percpu_rw_semaphore for lockdep */
 	rcu_sync_init(&sem->rss, RCU_SCHED_SYNC);
 	__init_rwsem(&sem->rw_sem, name, rwsem_key);
-	init_waitqueue_head(&sem->writer);
+	rcuwait_init(&sem->writer);
 	sem->readers_block = 0;
 	return 0;
 }
@@ -103,7 +102,7 @@ void __percpu_up_read(struct percpu_rw_semaphore *sem)
 	__this_cpu_dec(*sem->read_count);
 
 	/* Prod writer to recheck readers_active */
-	wake_up(&sem->writer);
+	rcuwait_wake_up(&sem->writer);
 }
 EXPORT_SYMBOL_GPL(__percpu_up_read);
 
@@ -160,7 +159,7 @@ void percpu_down_write(struct percpu_rw_semaphore *sem)
 	 */
 
 	/* Wait for all now active readers to complete. */
-	wait_event(sem->writer, readers_active_check(sem));
+	rcuwait_wait_event(&sem->writer, readers_active_check(sem));
 }
 EXPORT_SYMBOL_GPL(percpu_down_write);
 
-- 
2.6.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:locking/core] sched/wait, RCU: Introduce rcuwait machinery
  2017-01-11 15:22 ` [PATCH 1/2] sched: Introduce rcuwait machinery Davidlohr Bueso
@ 2017-01-14 12:32   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2017-01-14 12:32 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: oleg, linux-kernel, torvalds, mingo, akpm, tglx, dbueso, peterz,
	dave, paulmck, hpa

Commit-ID:  8f95c90ceb541a38ac16fec48c05142ef1450c25
Gitweb:     http://git.kernel.org/tip/8f95c90ceb541a38ac16fec48c05142ef1450c25
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Wed, 11 Jan 2017 07:22:25 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 11:14:33 +0100

sched/wait, RCU: Introduce rcuwait machinery

rcuwait provides support for (single) RCU-safe task wait/wake functionality,
with the caveat that it must not be called after exit_notify(), such that
we avoid racing with rcu delayed_put_task_struct callbacks, task_struct
being rcu unaware in this context -- for which we similarly have
task_rcu_dereference() magic, but with different return semantics, which
can conflict with the wakeup side.

The interfaces are quite straightforward:

  rcuwait_wait_event()
  rcuwait_wake_up()

More details are in the comments, but it's perhaps worth mentioning at least,
that users must provide proper serialization when waiting on a condition, and
avoid corrupting a concurrent waiter. Also care must be taken between the task
and the condition for when calling the wakeup -- we cannot miss wakeups. When
porting users, this is for example, a given when using waitqueues in that
everything is done under the q->lock. As such, it can remove sources of non
preemptable unbounded work for realtime.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1484148146-14210-2-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/rcuwait.h | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/exit.c           | 30 +++++++++++++++++++++++
 2 files changed, 93 insertions(+)

diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
new file mode 100644
index 0000000..0e93d56
--- /dev/null
+++ b/include/linux/rcuwait.h
@@ -0,0 +1,63 @@
+#ifndef _LINUX_RCUWAIT_H_
+#define _LINUX_RCUWAIT_H_
+
+#include <linux/rcupdate.h>
+
+/*
+ * rcuwait provides a way of blocking and waking up a single
+ * task in an rcu-safe manner; where it is forbidden to use
+ * after exit_notify(). task_struct is not properly rcu protected,
+ * unless dealing with rcu-aware lists, ie: find_task_by_*().
+ *
+ * Alternatively we have task_rcu_dereference(), but the return
+ * semantics have different implications which would break the
+ * wakeup side. The only time @task is non-nil is when a user is
+ * blocked (or checking if it needs to) on a condition, and reset
+ * as soon as we know that the condition has succeeded and are
+ * awoken.
+ */
+struct rcuwait {
+	struct task_struct *task;
+};
+
+#define __RCUWAIT_INITIALIZER(name)		\
+	{ .task = NULL, }
+
+static inline void rcuwait_init(struct rcuwait *w)
+{
+	w->task = NULL;
+}
+
+extern void rcuwait_wake_up(struct rcuwait *w);
+
+/*
+ * The caller is responsible for locking around rcuwait_wait_event(),
+ * such that writes to @task are properly serialized.
+ */
+#define rcuwait_wait_event(w, condition)				\
+({									\
+	/*								\
+	 * Complain if we are called after do_exit()/exit_notify(),     \
+	 * as we cannot rely on the rcu critical region for the		\
+	 * wakeup side.							\
+	 */                                                             \
+	WARN_ON(current->exit_state);                                   \
+									\
+	rcu_assign_pointer((w)->task, current);				\
+	for (;;) {							\
+		/*							\
+		 * Implicit barrier (A) pairs with (B) in		\
+		 * rcuwait_trywake().					\
+		 */							\
+		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		if (condition)						\
+			break;						\
+									\
+		schedule();						\
+	}								\
+									\
+	WRITE_ONCE((w)->task, NULL);					\
+	__set_current_state(TASK_RUNNING);				\
+})
+
+#endif /* _LINUX_RCUWAIT_H_ */
diff --git a/kernel/exit.c b/kernel/exit.c
index 27c6865..a9441da 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -55,6 +55,7 @@
 #include <linux/shm.h>
 #include <linux/kcov.h>
 #include <linux/random.h>
+#include <linux/rcuwait.h>
 
 #include <linux/uaccess.h>
 #include <asm/unistd.h>
@@ -282,6 +283,35 @@ retry:
 	return task;
 }
 
+void rcuwait_wake_up(struct rcuwait *w)
+{
+	struct task_struct *task;
+
+	rcu_read_lock();
+
+	/*
+	 * Order condition vs @task, such that everything prior to the load
+	 * of @task is visible. This is the condition as to why the user called
+	 * rcuwait_trywake() in the first place. Pairs with set_current_state()
+	 * barrier (A) in rcuwait_wait_event().
+	 *
+	 *    WAIT                WAKE
+	 *    [S] tsk = current	  [S] cond = true
+	 *        MB (A)	      MB (B)
+	 *    [L] cond		  [L] tsk
+	 */
+	smp_rmb(); /* (B) */
+
+	/*
+	 * Avoid using task_rcu_dereference() magic as long as we are careful,
+	 * see comment in rcuwait_wait_event() regarding ->exit_state.
+	 */
+	task = rcu_dereference(w->task);
+	if (task)
+		wake_up_process(task);
+	rcu_read_unlock();
+}
+
 struct task_struct *try_get_task_struct(struct task_struct **ptask)
 {
 	struct task_struct *task;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:locking/core] locking/percpu-rwsem: Replace waitqueue with rcuwait
  2017-01-11 15:22 ` [PATCH 2/2] locking/percpu-rwsem: Replace waitqueue with rcuwait Davidlohr Bueso
@ 2017-01-14 12:33   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 11+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2017-01-14 12:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: oleg, tglx, hpa, akpm, peterz, mingo, paulmck, torvalds, dbueso,
	dave, linux-kernel

Commit-ID:  52b94129f274937e4c25dd17b76697664a3c43c9
Gitweb:     http://git.kernel.org/tip/52b94129f274937e4c25dd17b76697664a3c43c9
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Wed, 11 Jan 2017 07:22:26 -0800
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Sat, 14 Jan 2017 11:14:35 +0100

locking/percpu-rwsem: Replace waitqueue with rcuwait

The use of any kind of wait queue is an overkill for pcpu-rwsems.
While one option would be to use the less heavy simple (swait)
flavor, this is still too much for what pcpu-rwsems needs. For one,
we do not care about any sort of queuing in that the only (rare) time
writers (and readers, for that matter) are queued is when trying to
acquire the regular contended rw_sem. There cannot be any further
queuing as writers are serialized by the rw_sem in the first place.

Given that percpu_down_write() must not be called after exit_notify(),
we can replace the bulky waitqueue with rcuwait such that a writer
can wait for its turn to take the lock. As such, we can avoid the
queue handling and locking overhead.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1484148146-14210-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/percpu-rwsem.h  | 8 ++++----
 kernel/locking/percpu-rwsem.c | 7 +++----
 2 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 5b2e615..93664f0 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -4,15 +4,15 @@
 #include <linux/atomic.h>
 #include <linux/rwsem.h>
 #include <linux/percpu.h>
-#include <linux/wait.h>
+#include <linux/rcuwait.h>
 #include <linux/rcu_sync.h>
 #include <linux/lockdep.h>
 
 struct percpu_rw_semaphore {
 	struct rcu_sync		rss;
 	unsigned int __percpu	*read_count;
-	struct rw_semaphore	rw_sem;
-	wait_queue_head_t	writer;
+	struct rw_semaphore	rw_sem; /* slowpath */
+	struct rcuwait          writer; /* blocked writer */
 	int			readers_block;
 };
 
@@ -22,7 +22,7 @@ static struct percpu_rw_semaphore name = {				\
 	.rss = __RCU_SYNC_INITIALIZER(name.rss, RCU_SCHED_SYNC),	\
 	.read_count = &__percpu_rwsem_rc_##name,			\
 	.rw_sem = __RWSEM_INITIALIZER(name.rw_sem),			\
-	.writer = __WAIT_QUEUE_HEAD_INITIALIZER(name.writer),		\
+	.writer = __RCUWAIT_INITIALIZER(name.writer),			\
 }
 
 extern int __percpu_down_read(struct percpu_rw_semaphore *, int);
diff --git a/kernel/locking/percpu-rwsem.c b/kernel/locking/percpu-rwsem.c
index ce18259..883cf1b 100644
--- a/kernel/locking/percpu-rwsem.c
+++ b/kernel/locking/percpu-rwsem.c
@@ -1,7 +1,6 @@
 #include <linux/atomic.h>
 #include <linux/rwsem.h>
 #include <linux/percpu.h>
-#include <linux/wait.h>
 #include <linux/lockdep.h>
 #include <linux/percpu-rwsem.h>
 #include <linux/rcupdate.h>
@@ -18,7 +17,7 @@ int __percpu_init_rwsem(struct percpu_rw_semaphore *sem,
 	/* ->rw_sem represents the whole percpu_rw_semaphore for lockdep */
 	rcu_sync_init(&sem->rss, RCU_SCHED_SYNC);
 	__init_rwsem(&sem->rw_sem, name, rwsem_key);
-	init_waitqueue_head(&sem->writer);
+	rcuwait_init(&sem->writer);
 	sem->readers_block = 0;
 	return 0;
 }
@@ -103,7 +102,7 @@ void __percpu_up_read(struct percpu_rw_semaphore *sem)
 	__this_cpu_dec(*sem->read_count);
 
 	/* Prod writer to recheck readers_active */
-	wake_up(&sem->writer);
+	rcuwait_wake_up(&sem->writer);
 }
 EXPORT_SYMBOL_GPL(__percpu_up_read);
 
@@ -160,7 +159,7 @@ void percpu_down_write(struct percpu_rw_semaphore *sem)
 	 */
 
 	/* Wait for all now active readers to complete. */
-	wait_event(sem->writer, readers_active_check(sem));
+	rcuwait_wait_event(&sem->writer, readers_active_check(sem));
 }
 EXPORT_SYMBOL_GPL(percpu_down_write);
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched: Introduce rcuwait machinery
  2017-01-16  1:32   ` Davidlohr Bueso
@ 2017-01-17 17:41     ` Oleg Nesterov
  0 siblings, 0 replies; 11+ messages in thread
From: Oleg Nesterov @ 2017-01-17 17:41 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: mingo, peterz, linux-kernel, Davidlohr Bueso

On 01/15, Davidlohr Bueso wrote:
>
> On Thu, 22 Dec 2016, Bueso wrote:
>
>> +	WARN_ON(current->exit_state);                                   \
>
> While not related to this patch, but per 3245d6acab9 (exit: fix race
> between wait_consider_task() and wait_task_zombie()), should we not
> *_ONCE() all things ->exit_state?

current->exit_state != 0 is stable. I mean, only current can change it
from zero to non-zero, and once it is non-zero it can't be zero again.

> I'm not really refering to a specific
> bug (much less here, where that race would not matter obviously), but
> if nothing else, for documentation

Oh, I won't argue but I do not agree. To me, READ_ONCE() often adds some
confusion because I can almost never understand if it is actually needed
for correctness or it was added "just in case".

Oleg.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched: Introduce rcuwait machinery
  2016-12-22 17:01 ` [PATCH 1/2] sched: Introduce rcuwait machinery Davidlohr Bueso
  2016-12-22 19:27   ` kbuild test robot
  2016-12-22 19:55   ` kbuild test robot
@ 2017-01-16  1:32   ` Davidlohr Bueso
  2017-01-17 17:41     ` Oleg Nesterov
  2 siblings, 1 reply; 11+ messages in thread
From: Davidlohr Bueso @ 2017-01-16  1:32 UTC (permalink / raw)
  To: mingo, peterz, oleg; +Cc: linux-kernel, Davidlohr Bueso

On Thu, 22 Dec 2016, Bueso wrote:

>+	WARN_ON(current->exit_state);                                   \

While not related to this patch, but per 3245d6acab9 (exit: fix race
between wait_consider_task() and wait_task_zombie()), should we not
*_ONCE() all things ->exit_state? I'm not really refering to a specific
bug (much less here, where that race would not matter obviously), but
if nothing else, for documentation -- and I doubt it would make any
difference performance wise.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched: Introduce rcuwait machinery
  2016-12-22 19:27   ` kbuild test robot
@ 2017-01-03 23:20     ` Davidlohr Bueso
  0 siblings, 0 replies; 11+ messages in thread
From: Davidlohr Bueso @ 2017-01-03 23:20 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, mingo, peterz, oleg, linux-kernel, Davidlohr Bueso

On Fri, 23 Dec 2016, kbuild test robot wrote:
>>> kernel/exit.c:285:29: warning: 'struct rcuwait' declared inside parameter list will not be visible outside of this definition or declaration
>    void rcuwait_trywake(struct rcuwait *w)
>                                ^~~~~~~

Ah, I'm missing an linux/rcuwait.h include there. Here's v2, thanks.

-----8<--------------------------------------------
From: Davidlohr Bueso <dave@stgolabs.net>
Subject: [PATCH v2 1/2] sched: Introduce rcuwait machinery

rcuwait provides support for (single) rcu-safe task wait/wake functionality,
with the caveat that it must not be called after exit_notify(), such that
we avoid racing with rcu delayed_put_task_struct callbacks, task_struct
being rcu unaware in this context -- for which we similarly have
task_rcu_dereference() magic, but with different return semantics, which
can conflict with the wakeup side.

The interfaces are quite straightforward:

rcuwait_wait_event()
rcuwait_trywake()

More details are in the comments, but it's perhaps worth mentioning at least,
that users must provide proper serialization when waiting on a condition, and
avoid corrupting a concurrent waiter. Also care must be taken between the task
and the condition for when calling the wakeup -- we cannot miss wakeups. When
porting users, this is for example, a given when using waitqueues in that
everything is done under the q->lock.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/rcuwait.h | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/exit.c           | 30 +++++++++++++++++++++++
 2 files changed, 93 insertions(+)
 create mode 100644 include/linux/rcuwait.h

diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
new file mode 100644
index 000000000000..3e07beb14c1f
--- /dev/null
+++ b/include/linux/rcuwait.h
@@ -0,0 +1,63 @@
+#ifndef _LINUX_RCUWAIT_H_
+#define _LINUX_RCUWAIT_H_
+
+#include <linux/rcupdate.h>
+
+/*
+ * rcuwait provides a way of blocking and waking up a single
+ * task in an rcu-safe manner; where it is forbidden to use
+ * after exit_notify(). task_struct is not properly rcu protected,
+ * unless dealing with rcu-aware lists, ie: find_task_by_*().
+ *
+ * Alternatively we have task_rcu_dereference(), but the return
+ * semantics have different implications which would break the
+ * wakeup side. The only time @task is non-nil is when a user is
+ * blocked (or checking if it needs to) on a condition, and reset
+ * as soon as we know that the condition has succeeded and are
+ * awoken.
+ */
+struct rcuwait {
+	struct task_struct *task;
+};
+
+#define __RCUWAIT_INITIALIZER(name)		\
+	{ .task = NULL, }
+
+static inline void rcuwait_init(struct rcuwait *w)
+{
+	w->task = NULL;
+}
+
+extern void rcuwait_trywake(struct rcuwait *w);
+
+/*
+ * The caller is responsible for locking around rcuwait_wait_event(),
+ * such that writes to @task are properly serialized.
+ */
+#define rcuwait_wait_event(w, condition)				\
+({									\
+	/*								\
+	 * Complain if we are called after do_exit()/exit_notify(),     \
+	 * as we cannot rely on the rcu critical region for the		\
+	 * wakeup side.							\
+	 */                                                             \
+	WARN_ON(current->exit_state);                                   \
+									\
+	rcu_assign_pointer((w)->task, current);				\
+	for (;;) {							\
+		/*							\
+		 * Implicit barrier (A) pairs with (B) in		\
+		 * rcuwait_trywake().					\
+		 */							\
+		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		if (condition)						\
+			break;						\
+									\
+		schedule();						\
+	}								\
+									\
+	WRITE_ONCE((w)->task, NULL);					\
+	__set_current_state(TASK_RUNNING);				\
+})
+
+#endif /* _LINUX_RCUWAIT_H_ */
diff --git a/kernel/exit.c b/kernel/exit.c
index 8f14b866f9f6..e579b30a35a7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -55,6 +55,7 @@
 #include <linux/shm.h>
 #include <linux/kcov.h>
 #include <linux/random.h>
+#include <linux/rcuwait.h>
 
 #include <linux/uaccess.h>
 #include <asm/unistd.h>
@@ -282,6 +283,35 @@ struct task_struct *task_rcu_dereference(struct task_struct **ptask)
 	return task;
 }
 
+void rcuwait_trywake(struct rcuwait *w)
+{
+	struct task_struct *task;
+
+	rcu_read_lock();
+
+	/*
+	 * Order condition vs @task, such that everything prior to the load
+	 * of @task is visible. This is the condition as to why the user called
+	 * rcuwait_trywake() in the first place. Pairs with set_current_state()
+	 * barrier (A) in rcuwait_wait_event().
+	 *
+	 *    WAIT                WAKE
+	 *    [S] tsk = current	  [S] cond = true
+	 *        MB (A)	      MB (B)
+	 *    [L] cond		  [L] tsk
+	 */
+	smp_rmb(); /* (B) */
+
+	/*
+	 * Avoid using task_rcu_dereference() magic as long as we are careful,
+	 * see comment in rcuwait_wait_event() regarding ->exit_state.
+	 */
+	task = rcu_dereference(w->task);
+	if (task)
+		wake_up_process(task);
+	rcu_read_unlock();
+}
+
 struct task_struct *try_get_task_struct(struct task_struct **ptask)
 {
 	struct task_struct *task;
-- 
2.6.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched: Introduce rcuwait machinery
  2016-12-22 17:01 ` [PATCH 1/2] sched: Introduce rcuwait machinery Davidlohr Bueso
  2016-12-22 19:27   ` kbuild test robot
@ 2016-12-22 19:55   ` kbuild test robot
  2017-01-16  1:32   ` Davidlohr Bueso
  2 siblings, 0 replies; 11+ messages in thread
From: kbuild test robot @ 2016-12-22 19:55 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: kbuild-all, mingo, peterz, oleg, linux-kernel, dave, Davidlohr Bueso


[-- Attachment #1: Type: text/plain, Size: 11470 bytes --]

Hi Davidlohr,

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.9 next-20161222]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Davidlohr-Bueso/sched-Introduce-rcuwait/20161223-020109
config: m68k-sun3_defconfig (attached as .config)
compiler: m68k-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=m68k 

Note: the linux-review/Davidlohr-Bueso/sched-Introduce-rcuwait/20161223-020109 HEAD 9e9d238f94d5aa8e348e7e70585533fe0dbd373b builds fine.
      It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

>> kernel/exit.c:285:29: warning: 'struct rcuwait' declared inside parameter list
    void rcuwait_trywake(struct rcuwait *w)
                                ^
>> kernel/exit.c:285:29: warning: its scope is only this definition or declaration, which is probably not what you want
   In file included from include/linux/srcu.h:33:0,
                    from include/linux/notifier.h:15,
                    from include/linux/memory_hotplug.h:6,
                    from include/linux/mmzone.h:751,
                    from include/linux/gfp.h:5,
                    from include/linux/mm.h:9,
                    from kernel/exit.c:7:
   kernel/exit.c: In function 'rcuwait_trywake':
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
     task = rcu_dereference(w->task);
                             ^
   include/linux/rcupdate.h:606:10: note: in definition of macro '__rcu_dereference_check'
     typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
             ^
   include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
    #define rcu_dereference(p) rcu_dereference_check(p, 0)
                               ^
   kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
     task = rcu_dereference(w->task);
            ^
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
     task = rcu_dereference(w->task);
                             ^
   include/linux/rcupdate.h:606:36: note: in definition of macro '__rcu_dereference_check'
     typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
                                       ^
   include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
    #define rcu_dereference(p) rcu_dereference_check(p, 0)
                               ^
   kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
     task = rcu_dereference(w->task);
            ^
   In file included from include/asm-generic/bug.h:4:0,
                    from arch/m68k/include/asm/bug.h:28,
                    from include/linux/bug.h:4,
                    from include/linux/mmdebug.h:4,
                    from include/linux/mm.h:8,
                    from kernel/exit.c:7:
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
     task = rcu_dereference(w->task);
                             ^
   include/linux/compiler.h:563:9: note: in definition of macro 'lockless_dereference'
     typeof(p) _________p1 = READ_ONCE(p); \
            ^
   include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
     __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
     ^
   include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
    #define rcu_dereference(p) rcu_dereference_check(p, 0)
                               ^
   kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
     task = rcu_dereference(w->task);
            ^
   In file included from include/asm-generic/bug.h:4:0,
                    from arch/m68k/include/asm/bug.h:28,
                    from include/linux/bug.h:4,
                    from include/linux/mmdebug.h:4,
                    from include/linux/mm.h:8,
                    from kernel/exit.c:7:
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
     task = rcu_dereference(w->task);
                             ^
   include/linux/compiler.h:305:17: note: in definition of macro '__READ_ONCE'
     union { typeof(x) __val; char __c[1]; } __u;   \
                    ^
>> include/linux/compiler.h:563:26: note: in expansion of macro 'READ_ONCE'
     typeof(p) _________p1 = READ_ONCE(p); \
                             ^
>> include/linux/rcupdate.h:606:48: note: in expansion of macro 'lockless_dereference'
     typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
                                                   ^
   include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
     __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
     ^
   include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
    #define rcu_dereference(p) rcu_dereference_check(p, 0)
                               ^
   kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
     task = rcu_dereference(w->task);
            ^
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
     task = rcu_dereference(w->task);
                             ^
   include/linux/compiler.h:307:22: note: in definition of macro '__READ_ONCE'
      __read_once_size(&(x), __u.__c, sizeof(x));  \
                         ^
>> include/linux/compiler.h:563:26: note: in expansion of macro 'READ_ONCE'
     typeof(p) _________p1 = READ_ONCE(p); \
                             ^
>> include/linux/rcupdate.h:606:48: note: in expansion of macro 'lockless_dereference'
     typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
                                                   ^
   include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
     __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
     ^
   include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
    #define rcu_dereference(p) rcu_dereference_check(p, 0)
                               ^
   kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
     task = rcu_dereference(w->task);
            ^
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
     task = rcu_dereference(w->task);
                             ^
   include/linux/compiler.h:307:42: note: in definition of macro '__READ_ONCE'
      __read_once_size(&(x), __u.__c, sizeof(x));  \
                                             ^
>> include/linux/compiler.h:563:26: note: in expansion of macro 'READ_ONCE'
     typeof(p) _________p1 = READ_ONCE(p); \
                             ^
>> include/linux/rcupdate.h:606:48: note: in expansion of macro 'lockless_dereference'
     typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
                                                   ^
   include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
     __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
     ^
   include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
    #define rcu_dereference(p) rcu_dereference_check(p, 0)
                               ^
   kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
     task = rcu_dereference(w->task);
            ^
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
     task = rcu_dereference(w->task);
                             ^
   include/linux/compiler.h:309:30: note: in definition of macro '__READ_ONCE'
      __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
                                 ^
>> include/linux/compiler.h:563:26: note: in expansion of macro 'READ_ONCE'
     typeof(p) _________p1 = READ_ONCE(p); \
                             ^
>> include/linux/rcupdate.h:606:48: note: in expansion of macro 'lockless_dereference'
     typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
                                                   ^
   include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
     __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
     ^
   include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
    #define rcu_dereference(p) rcu_dereference_check(p, 0)
                               ^
   kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
     task = rcu_dereference(w->task);
            ^
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type
     task = rcu_dereference(w->task);
                             ^
   include/linux/compiler.h:309:50: note: in definition of macro '__READ_ONCE'
      __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \
                                                     ^
>> include/linux/compiler.h:563:26: note: in expansion of macro 'READ_ONCE'
     typeof(p) _________p1 = READ_ONCE(p); \
                             ^
>> include/linux/rcupdate.h:606:48: note: in expansion of macro 'lockless_dereference'
     typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
                                                   ^
   include/linux/rcupdate.h:727:2: note: in expansion of macro '__rcu_dereference_check'
     __rcu_dereference_check((p), (c) || rcu_read_lock_held(), __rcu)
     ^
   include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
    #define rcu_dereference(p) rcu_dereference_check(p, 0)
                               ^
   kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
     task = rcu_dereference(w->task);
            ^
   In file included from include/asm-generic/bug.h:4:0,
                    from arch/m68k/include/asm/bug.h:28,
                    from include/linux/bug.h:4,
                    from include/linux/mmdebug.h:4,
                    from include/linux/mm.h:8,
                    from kernel/exit.c:7:

vim +308 kernel/exit.c

   279		if (!sighand)
   280			return NULL;
   281	
   282		return task;
   283	}
   284	
 > 285	void rcuwait_trywake(struct rcuwait *w)
   286	{
   287		struct task_struct *task;
   288	
   289		rcu_read_lock();
   290	
   291		/*
   292		 * Order condition vs @task, such that everything prior to the load
   293		 * of @task is visible. This is the condition as to why the user called
   294		 * rcuwait_trywake() in the first place. Pairs with set_current_state()
   295		 * barrier (A) in rcuwait_wait_event().
   296		 *
   297		 *    WAIT                WAKE
   298		 *    [S] tsk = current	  [S] cond = true
   299		 *        MB (A)	      MB (B)
   300		 *    [L] cond		  [L] tsk
   301		 */
   302		smp_rmb(); /* (B) */
   303	
   304		/*
   305		 * Avoid using task_rcu_dereference() magic as long as we are careful,
   306		 * see comment in rcuwait_wait_event() regarding ->exit_state.
   307		 */
 > 308		task = rcu_dereference(w->task);
   309		if (task)
   310			wake_up_process(task);
   311		rcu_read_unlock();

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 11676 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched: Introduce rcuwait machinery
  2016-12-22 17:01 ` [PATCH 1/2] sched: Introduce rcuwait machinery Davidlohr Bueso
@ 2016-12-22 19:27   ` kbuild test robot
  2017-01-03 23:20     ` Davidlohr Bueso
  2016-12-22 19:55   ` kbuild test robot
  2017-01-16  1:32   ` Davidlohr Bueso
  2 siblings, 1 reply; 11+ messages in thread
From: kbuild test robot @ 2016-12-22 19:27 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: kbuild-all, mingo, peterz, oleg, linux-kernel, dave, Davidlohr Bueso


[-- Attachment #1: Type: text/plain, Size: 3251 bytes --]

Hi Davidlohr,

[auto build test ERROR on tip/auto-latest]
[also build test ERROR on v4.9 next-20161222]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Davidlohr-Bueso/sched-Introduce-rcuwait/20161223-020109
config: i386-randconfig-s1-201651 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: the linux-review/Davidlohr-Bueso/sched-Introduce-rcuwait/20161223-020109 HEAD 9e9d238f94d5aa8e348e7e70585533fe0dbd373b builds fine.
      It only hurts bisectibility.

All error/warnings (new ones prefixed by >>):

>> kernel/exit.c:285:29: warning: 'struct rcuwait' declared inside parameter list will not be visible outside of this definition or declaration
    void rcuwait_trywake(struct rcuwait *w)
                                ^~~~~~~
   In file included from include/linux/srcu.h:33:0,
                    from include/linux/notifier.h:15,
                    from include/linux/memory_hotplug.h:6,
                    from include/linux/mmzone.h:751,
                    from include/linux/gfp.h:5,
                    from include/linux/mm.h:9,
                    from kernel/exit.c:7:
   kernel/exit.c: In function 'rcuwait_trywake':
>> kernel/exit.c:308:26: error: dereferencing pointer to incomplete type 'struct rcuwait'
     task = rcu_dereference(w->task);
                             ^
   include/linux/rcupdate.h:606:10: note: in definition of macro '__rcu_dereference_check'
     typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \
             ^
   include/linux/rcupdate.h:786:28: note: in expansion of macro 'rcu_dereference_check'
    #define rcu_dereference(p) rcu_dereference_check(p, 0)
                               ^~~~~~~~~~~~~~~~~~~~~
>> kernel/exit.c:308:9: note: in expansion of macro 'rcu_dereference'
     task = rcu_dereference(w->task);
            ^~~~~~~~~~~~~~~

vim +308 kernel/exit.c

   279		if (!sighand)
   280			return NULL;
   281	
   282		return task;
   283	}
   284	
 > 285	void rcuwait_trywake(struct rcuwait *w)
   286	{
   287		struct task_struct *task;
   288	
   289		rcu_read_lock();
   290	
   291		/*
   292		 * Order condition vs @task, such that everything prior to the load
   293		 * of @task is visible. This is the condition as to why the user called
   294		 * rcuwait_trywake() in the first place. Pairs with set_current_state()
   295		 * barrier (A) in rcuwait_wait_event().
   296		 *
   297		 *    WAIT                WAKE
   298		 *    [S] tsk = current	  [S] cond = true
   299		 *        MB (A)	      MB (B)
   300		 *    [L] cond		  [L] tsk
   301		 */
   302		smp_rmb(); /* (B) */
   303	
   304		/*
   305		 * Avoid using task_rcu_dereference() magic as long as we are careful,
   306		 * see comment in rcuwait_wait_event() regarding ->exit_state.
   307		 */
 > 308		task = rcu_dereference(w->task);
   309		if (task)
   310			wake_up_process(task);
   311		rcu_read_unlock();

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 33607 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] sched: Introduce rcuwait machinery
  2016-12-22 17:01 [PATCH 0/2] sched: Introduce rcuwait Davidlohr Bueso
@ 2016-12-22 17:01 ` Davidlohr Bueso
  2016-12-22 19:27   ` kbuild test robot
                     ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Davidlohr Bueso @ 2016-12-22 17:01 UTC (permalink / raw)
  To: mingo, peterz, oleg; +Cc: linux-kernel, dave, Davidlohr Bueso

rcuwait provides support for (single) rcu-safe task wait/wake functionality,
with the caveat that it must not be called after exit_notify(), such that
we avoid racing with rcu delayed_put_task_struct callbacks, task_struct
being rcu unaware in this context -- for which we similarly have
task_rcu_dereference() magic, but with different return semantics, which
can conflict with the wakeup side.

The interfaces are quite straightforward:

rcuwait_wait_event()
rcuwait_trywake()

More details are in the comments, but it's perhaps worth mentioning at least,
that users must provide proper serialization when waiting on a condition, and
avoid corrupting a concurrent waiter. Also care must be taken between the task
and the condition for when calling the wakeup -- we cannot miss wakeups. When
porting users, this is for example, a given when using waitqueues in that
everything is done under the q->lock.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/rcuwait.h | 63 +++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/exit.c           | 29 +++++++++++++++++++++++
 2 files changed, 92 insertions(+)
 create mode 100644 include/linux/rcuwait.h

diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h
new file mode 100644
index 000000000000..3e07beb14c1f
--- /dev/null
+++ b/include/linux/rcuwait.h
@@ -0,0 +1,63 @@
+#ifndef _LINUX_RCUWAIT_H_
+#define _LINUX_RCUWAIT_H_
+
+#include <linux/rcupdate.h>
+
+/*
+ * rcuwait provides a way of blocking and waking up a single
+ * task in an rcu-safe manner; where it is forbidden to use
+ * after exit_notify(). task_struct is not properly rcu protected,
+ * unless dealing with rcu-aware lists, ie: find_task_by_*().
+ *
+ * Alternatively we have task_rcu_dereference(), but the return
+ * semantics have different implications which would break the
+ * wakeup side. The only time @task is non-nil is when a user is
+ * blocked (or checking if it needs to) on a condition, and reset
+ * as soon as we know that the condition has succeeded and are
+ * awoken.
+ */
+struct rcuwait {
+	struct task_struct *task;
+};
+
+#define __RCUWAIT_INITIALIZER(name)		\
+	{ .task = NULL, }
+
+static inline void rcuwait_init(struct rcuwait *w)
+{
+	w->task = NULL;
+}
+
+extern void rcuwait_trywake(struct rcuwait *w);
+
+/*
+ * The caller is responsible for locking around rcuwait_wait_event(),
+ * such that writes to @task are properly serialized.
+ */
+#define rcuwait_wait_event(w, condition)				\
+({									\
+	/*								\
+	 * Complain if we are called after do_exit()/exit_notify(),     \
+	 * as we cannot rely on the rcu critical region for the		\
+	 * wakeup side.							\
+	 */                                                             \
+	WARN_ON(current->exit_state);                                   \
+									\
+	rcu_assign_pointer((w)->task, current);				\
+	for (;;) {							\
+		/*							\
+		 * Implicit barrier (A) pairs with (B) in		\
+		 * rcuwait_trywake().					\
+		 */							\
+		set_current_state(TASK_UNINTERRUPTIBLE);		\
+		if (condition)						\
+			break;						\
+									\
+		schedule();						\
+	}								\
+									\
+	WRITE_ONCE((w)->task, NULL);					\
+	__set_current_state(TASK_RUNNING);				\
+})
+
+#endif /* _LINUX_RCUWAIT_H_ */
diff --git a/kernel/exit.c b/kernel/exit.c
index aacff8e2aec0..6862884179a8 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -282,6 +282,35 @@ struct task_struct *task_rcu_dereference(struct task_struct **ptask)
 	return task;
 }
 
+void rcuwait_trywake(struct rcuwait *w)
+{
+	struct task_struct *task;
+
+	rcu_read_lock();
+
+	/*
+	 * Order condition vs @task, such that everything prior to the load
+	 * of @task is visible. This is the condition as to why the user called
+	 * rcuwait_trywake() in the first place. Pairs with set_current_state()
+	 * barrier (A) in rcuwait_wait_event().
+	 *
+	 *    WAIT                WAKE
+	 *    [S] tsk = current	  [S] cond = true
+	 *        MB (A)	      MB (B)
+	 *    [L] cond		  [L] tsk
+	 */
+	smp_rmb(); /* (B) */
+
+	/*
+	 * Avoid using task_rcu_dereference() magic as long as we are careful,
+	 * see comment in rcuwait_wait_event() regarding ->exit_state.
+	 */
+	task = rcu_dereference(w->task);
+	if (task)
+		wake_up_process(task);
+	rcu_read_unlock();
+}
+
 struct task_struct *try_get_task_struct(struct task_struct **ptask)
 {
 	struct task_struct *task;
-- 
2.6.6

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-11 15:22 [PATCH v2 0/2] sched: Introduce rcuwait Davidlohr Bueso
2017-01-11 15:22 ` [PATCH 1/2] sched: Introduce rcuwait machinery Davidlohr Bueso
2017-01-14 12:32   ` [tip:locking/core] sched/wait, RCU: " tip-bot for Davidlohr Bueso
2017-01-11 15:22 ` [PATCH 2/2] locking/percpu-rwsem: Replace waitqueue with rcuwait Davidlohr Bueso
2017-01-14 12:33   ` [tip:locking/core] " tip-bot for Davidlohr Bueso
  -- strict thread matches above, loose matches on Subject: below --
2016-12-22 17:01 [PATCH 0/2] sched: Introduce rcuwait Davidlohr Bueso
2016-12-22 17:01 ` [PATCH 1/2] sched: Introduce rcuwait machinery Davidlohr Bueso
2016-12-22 19:27   ` kbuild test robot
2017-01-03 23:20     ` Davidlohr Bueso
2016-12-22 19:55   ` kbuild test robot
2017-01-16  1:32   ` Davidlohr Bueso
2017-01-17 17:41     ` Oleg Nesterov

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git