linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles
@ 2016-12-13  8:36 Peter Zijlstra
  2016-12-13  8:36 ` [PATCH -v4 01/10] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI Peter Zijlstra
                   ` (11 more replies)
  0 siblings, 12 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz


Hi all,

This is (I think) the 4th attempt at fixing this tiny pesky issue with
FUTEX_UNLOCK_PI, where we would really like to drop (and unboost) the rt_mutex
without holding hb->lock.

While going through the requeue PI code and thinking about how all that worked
I realized we can avoid the entire problem I've been trying to solve. That is,
the 'problem' is that futex state and rt_mutex state can end up disagreeing on
who is waiting for the lock and we muddle around that with intricate state.

This series, well patch 8, avoids the entire problem by making sure this
inconsistent state does not occur. Which then simplifies everything -- assuming
I got it right of course :-)

The basic idea is to, like requeue PI, break the rt_mutex_lock() function into
pieces, such that we can enqueue the waiter while holding hb->lock, wait for
acquisition without hb->lock and can remove the waiter, on failure, while
holding hb->lock again.

That way, when we drop hb->lock to wait, futex and rt_mutex wait state is
consistent.


In any case, it passes our inadequate testing.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -v4 01/10] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
@ 2016-12-13  8:36 ` Peter Zijlstra
  2016-12-16 23:58   ` Darren Hart
  2016-12-13  8:36 ` [PATCH -v4 02/10] futex: Add missing error handling to FUTEX_REQUEUE_PI Peter Zijlstra
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz

[-- Attachment #1: peter_zijlstra-futex-fix_potential_use-after-free_in_futex_requeue_pi.patch --]
[-- Type: text/plain, Size: 2112 bytes --]

While working on the futex code, I stumbled over this potential
use-after-free scenario.

pi_mutex is a pointer into pi_state, which we drop the reference on in
unqueue_me_pi(). So any access to that pointer after that is bad.

Since other sites already do rt_mutex_unlock() with hb->lock held, see
for example futex_lock_pi(), simply move the unlock before
unqueue_me_pi().

Cc: Ingo Molnar <mingo@kernel.org>
Cc: dvhart@infradead.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex.c |   20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2813,7 +2813,6 @@ static int futex_wait_requeue_pi(u32 __u
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct rt_mutex_waiter rt_waiter;
-	struct rt_mutex *pi_mutex = NULL;
 	struct futex_hash_bucket *hb;
 	union futex_key key2 = FUTEX_KEY_INIT;
 	struct futex_q q = futex_q_init;
@@ -2905,6 +2904,8 @@ static int futex_wait_requeue_pi(u32 __u
 			spin_unlock(q.lock_ptr);
 		}
 	} else {
+		struct rt_mutex *pi_mutex;
+
 		/*
 		 * We have been woken up by futex_unlock_pi(), a timeout, or a
 		 * signal.  futex_unlock_pi() will not destroy the lock_ptr nor
@@ -2928,18 +2929,19 @@ static int futex_wait_requeue_pi(u32 __u
 		if (res)
 			ret = (res < 0) ? res : 0;
 
+		/*
+		 * If fixup_pi_state_owner() faulted and was unable to handle
+		 * the fault, unlock the rt_mutex and return the fault to
+		 * userspace.
+		 */
+		if (ret && rt_mutex_owner(pi_mutex) == current)
+			rt_mutex_unlock(pi_mutex);
+
 		/* Unqueue and drop the lock. */
 		unqueue_me_pi(&q);
 	}
 
-	/*
-	 * If fixup_pi_state_owner() faulted and was unable to handle the
-	 * fault, unlock the rt_mutex and return the fault to userspace.
-	 */
-	if (ret == -EFAULT) {
-		if (pi_mutex && rt_mutex_owner(pi_mutex) == current)
-			rt_mutex_unlock(pi_mutex);
-	} else if (ret == -EINTR) {
+	if (ret == -EINTR) {
 		/*
 		 * We've already been requeued, but cannot restart by calling
 		 * futex_lock_pi() directly. We could restart this syscall, but

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -v4 02/10] futex: Add missing error handling to FUTEX_REQUEUE_PI
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
  2016-12-13  8:36 ` [PATCH -v4 01/10] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI Peter Zijlstra
@ 2016-12-13  8:36 ` Peter Zijlstra
  2016-12-17  0:06   ` Darren Hart
  2016-12-13  8:36 ` [PATCH -v4 03/10] futex: Cleanup variable names for futex_top_waiter() Peter Zijlstra
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz

[-- Attachment #1: peter_zijlstra-futex-fix_missing_error_handling.patch --]
[-- Type: text/plain, Size: 737 bytes --]

Thomas spotted that fixup_pi_state_owner() can return errors and we
fail to unlock the rt_mutex in that case.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex.c |    2 ++
 1 file changed, 2 insertions(+)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2896,6 +2896,8 @@ static int futex_wait_requeue_pi(u32 __u
 		if (q.pi_state && (q.pi_state->owner != current)) {
 			spin_lock(q.lock_ptr);
 			ret = fixup_pi_state_owner(uaddr2, &q, current);
+			if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current)
+				rt_mutex_unlock(&q.pi_state->pi_mutex);
 			/*
 			 * Drop the reference to the pi state which
 			 * the requeue_pi() code acquired for us.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -v4 03/10] futex: Cleanup variable names for futex_top_waiter()
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
  2016-12-13  8:36 ` [PATCH -v4 01/10] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI Peter Zijlstra
  2016-12-13  8:36 ` [PATCH -v4 02/10] futex: Add missing error handling to FUTEX_REQUEUE_PI Peter Zijlstra
@ 2016-12-13  8:36 ` Peter Zijlstra
  2016-12-17  0:13   ` Darren Hart
  2016-12-13  8:36 ` [PATCH -v4 04/10] futex: Use smp_store_release() in mark_wake_futex() Peter Zijlstra
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz

[-- Attachment #1: peterz-futex-cleanup-top_waiter.patch --]
[-- Type: text/plain, Size: 3384 bytes --]

futex_top_waiter() returns the top-waiter on the pi_mutex. Assinging
this to a variable 'match' totally obscures the code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex.c |   30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1120,14 +1120,14 @@ static int attach_to_pi_owner(u32 uval,
 static int lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 			   union futex_key *key, struct futex_pi_state **ps)
 {
-	struct futex_q *match = futex_top_waiter(hb, key);
+	struct futex_q *top_waiter = futex_top_waiter(hb, key);
 
 	/*
 	 * If there is a waiter on that futex, validate it and
 	 * attach to the pi_state when the validation succeeds.
 	 */
-	if (match)
-		return attach_to_pi_state(uval, match->pi_state, ps);
+	if (top_waiter)
+		return attach_to_pi_state(uval, top_waiter->pi_state, ps);
 
 	/*
 	 * We are the first waiter - try to look up the owner based on
@@ -1174,7 +1174,7 @@ static int futex_lock_pi_atomic(u32 __us
 				struct task_struct *task, int set_waiters)
 {
 	u32 uval, newval, vpid = task_pid_vnr(task);
-	struct futex_q *match;
+	struct futex_q *top_waiter;
 	int ret;
 
 	/*
@@ -1200,9 +1200,9 @@ static int futex_lock_pi_atomic(u32 __us
 	 * Lookup existing state first. If it exists, try to attach to
 	 * its pi_state.
 	 */
-	match = futex_top_waiter(hb, key);
-	if (match)
-		return attach_to_pi_state(uval, match->pi_state, ps);
+	top_waiter = futex_top_waiter(hb, key);
+	if (top_waiter)
+		return attach_to_pi_state(uval, top_waiter->pi_state, ps);
 
 	/*
 	 * No waiter and user TID is 0. We are here because the
@@ -1292,11 +1292,11 @@ static void mark_wake_futex(struct wake_
 	q->lock_ptr = NULL;
 }
 
-static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this,
+static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *top_waiter,
 			 struct futex_hash_bucket *hb)
 {
 	struct task_struct *new_owner;
-	struct futex_pi_state *pi_state = this->pi_state;
+	struct futex_pi_state *pi_state = top_waiter->pi_state;
 	u32 uninitialized_var(curval), newval;
 	DEFINE_WAKE_Q(wake_q);
 	bool deboost;
@@ -1317,11 +1317,11 @@ static int wake_futex_pi(u32 __user *uad
 
 	/*
 	 * It is possible that the next waiter (the one that brought
-	 * this owner to the kernel) timed out and is no longer
+	 * top_waiter owner to the kernel) timed out and is no longer
 	 * waiting on the lock.
 	 */
 	if (!new_owner)
-		new_owner = this->task;
+		new_owner = top_waiter->task;
 
 	/*
 	 * We pass it to the next owner. The WAITERS bit is always
@@ -2631,7 +2631,7 @@ static int futex_unlock_pi(u32 __user *u
 	u32 uninitialized_var(curval), uval, vpid = task_pid_vnr(current);
 	union futex_key key = FUTEX_KEY_INIT;
 	struct futex_hash_bucket *hb;
-	struct futex_q *match;
+	struct futex_q *top_waiter;
 	int ret;
 
 retry:
@@ -2655,9 +2655,9 @@ static int futex_unlock_pi(u32 __user *u
 	 * all and we at least want to know if user space fiddled
 	 * with the futex value instead of blindly unlocking.
 	 */
-	match = futex_top_waiter(hb, &key);
-	if (match) {
-		ret = wake_futex_pi(uaddr, uval, match, hb);
+	top_waiter = futex_top_waiter(hb, &key);
+	if (top_waiter) {
+		ret = wake_futex_pi(uaddr, uval, top_waiter, hb);
 		/*
 		 * In case of success wake_futex_pi dropped the hash
 		 * bucket lock.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -v4 04/10] futex: Use smp_store_release() in mark_wake_futex()
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
                   ` (2 preceding siblings ...)
  2016-12-13  8:36 ` [PATCH -v4 03/10] futex: Cleanup variable names for futex_top_waiter() Peter Zijlstra
@ 2016-12-13  8:36 ` Peter Zijlstra
  2016-12-17  0:50   ` Darren Hart
  2016-12-13  8:36 ` [PATCH -v4 05/10] futex: Remove rt_mutex_deadlock_account_*() Peter Zijlstra
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz

[-- Attachment #1: peterz-futex-mark_wake_futex.patch --]
[-- Type: text/plain, Size: 697 bytes --]

Since the futex_q can dissapear the instruction after assigning NULL,
this really should be a RELEASE barrier. That stops loads from hitting
dead memory too.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1288,8 +1288,7 @@ static void mark_wake_futex(struct wake_
 	 * memory barrier is required here to prevent the following
 	 * store to lock_ptr from getting ahead of the plist_del.
 	 */
-	smp_wmb();
-	q->lock_ptr = NULL;
+	smp_store_release(&q->lock_ptr, NULL);
 }
 
 static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *top_waiter,

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -v4 05/10] futex: Remove rt_mutex_deadlock_account_*()
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
                   ` (3 preceding siblings ...)
  2016-12-13  8:36 ` [PATCH -v4 04/10] futex: Use smp_store_release() in mark_wake_futex() Peter Zijlstra
@ 2016-12-13  8:36 ` Peter Zijlstra
  2016-12-13  8:36 ` [PATCH -v4 06/10] futex,rt_mutex: Provide futex specific rt_mutex API Peter Zijlstra
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz

[-- Attachment #1: peterz-futex-cleanup-deadlock.patch --]
[-- Type: text/plain, Size: 4950 bytes --]

These are unused and clutter up the code.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/locking/rtmutex-debug.c |    9 -------
 kernel/locking/rtmutex-debug.h |    3 --
 kernel/locking/rtmutex.c       |   47 +++++++++++++++--------------------------
 kernel/locking/rtmutex.h       |    2 -
 4 files changed, 18 insertions(+), 43 deletions(-)

--- a/kernel/locking/rtmutex-debug.c
+++ b/kernel/locking/rtmutex-debug.c
@@ -173,12 +173,3 @@ void debug_rt_mutex_init(struct rt_mutex
 	lock->name = name;
 }
 
-void
-rt_mutex_deadlock_account_lock(struct rt_mutex *lock, struct task_struct *task)
-{
-}
-
-void rt_mutex_deadlock_account_unlock(struct task_struct *task)
-{
-}
-
--- a/kernel/locking/rtmutex-debug.h
+++ b/kernel/locking/rtmutex-debug.h
@@ -9,9 +9,6 @@
  * This file contains macros used solely by rtmutex.c. Debug version.
  */
 
-extern void
-rt_mutex_deadlock_account_lock(struct rt_mutex *lock, struct task_struct *task);
-extern void rt_mutex_deadlock_account_unlock(struct task_struct *task);
 extern void debug_rt_mutex_init_waiter(struct rt_mutex_waiter *waiter);
 extern void debug_rt_mutex_free_waiter(struct rt_mutex_waiter *waiter);
 extern void debug_rt_mutex_init(struct rt_mutex *lock, const char *name);
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -936,8 +936,6 @@ static int try_to_take_rt_mutex(struct r
 	 */
 	rt_mutex_set_owner(lock, task);
 
-	rt_mutex_deadlock_account_lock(lock, task);
-
 	return 1;
 }
 
@@ -1340,8 +1338,6 @@ static bool __sched rt_mutex_slowunlock(
 
 	debug_rt_mutex_unlock(lock);
 
-	rt_mutex_deadlock_account_unlock(current);
-
 	/*
 	 * We must be careful here if the fast path is enabled. If we
 	 * have no waiters queued we cannot set owner to NULL here
@@ -1407,11 +1403,10 @@ rt_mutex_fastlock(struct rt_mutex *lock,
 				struct hrtimer_sleeper *timeout,
 				enum rtmutex_chainwalk chwalk))
 {
-	if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) {
-		rt_mutex_deadlock_account_lock(lock, current);
+	if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
 		return 0;
-	} else
-		return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK);
+
+	return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK);
 }
 
 static inline int
@@ -1423,21 +1418,19 @@ rt_mutex_timed_fastlock(struct rt_mutex
 				      enum rtmutex_chainwalk chwalk))
 {
 	if (chwalk == RT_MUTEX_MIN_CHAINWALK &&
-	    likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) {
-		rt_mutex_deadlock_account_lock(lock, current);
+	    likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
 		return 0;
-	} else
-		return slowfn(lock, state, timeout, chwalk);
+
+	return slowfn(lock, state, timeout, chwalk);
 }
 
 static inline int
 rt_mutex_fasttrylock(struct rt_mutex *lock,
 		     int (*slowfn)(struct rt_mutex *lock))
 {
-	if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) {
-		rt_mutex_deadlock_account_lock(lock, current);
+	if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
 		return 1;
-	}
+
 	return slowfn(lock);
 }
 
@@ -1447,19 +1440,18 @@ rt_mutex_fastunlock(struct rt_mutex *loc
 				   struct wake_q_head *wqh))
 {
 	DEFINE_WAKE_Q(wake_q);
+	bool deboost;
 
-	if (likely(rt_mutex_cmpxchg_release(lock, current, NULL))) {
-		rt_mutex_deadlock_account_unlock(current);
+	if (likely(rt_mutex_cmpxchg_release(lock, current, NULL)))
+		return;
 
-	} else {
-		bool deboost = slowfn(lock, &wake_q);
+	deboost = slowfn(lock, &wake_q);
 
-		wake_up_q(&wake_q);
+	wake_up_q(&wake_q);
 
-		/* Undo pi boosting if necessary: */
-		if (deboost)
-			rt_mutex_adjust_prio(current);
-	}
+	/* Undo pi boosting if necessary: */
+	if (deboost)
+		rt_mutex_adjust_prio(current);
 }
 
 /**
@@ -1570,10 +1562,9 @@ EXPORT_SYMBOL_GPL(rt_mutex_unlock);
 bool __sched rt_mutex_futex_unlock(struct rt_mutex *lock,
 				   struct wake_q_head *wqh)
 {
-	if (likely(rt_mutex_cmpxchg_release(lock, current, NULL))) {
-		rt_mutex_deadlock_account_unlock(current);
+	if (likely(rt_mutex_cmpxchg_release(lock, current, NULL)))
 		return false;
-	}
+
 	return rt_mutex_slowunlock(lock, wqh);
 }
 
@@ -1635,7 +1626,6 @@ void rt_mutex_init_proxy_locked(struct r
 	__rt_mutex_init(lock, NULL);
 	debug_rt_mutex_proxy_lock(lock, proxy_owner);
 	rt_mutex_set_owner(lock, proxy_owner);
-	rt_mutex_deadlock_account_lock(lock, proxy_owner);
 }
 
 /**
@@ -1655,7 +1645,6 @@ void rt_mutex_proxy_unlock(struct rt_mut
 {
 	debug_rt_mutex_proxy_unlock(lock);
 	rt_mutex_set_owner(lock, NULL);
-	rt_mutex_deadlock_account_unlock(proxy_owner);
 }
 
 /**
--- a/kernel/locking/rtmutex.h
+++ b/kernel/locking/rtmutex.h
@@ -11,8 +11,6 @@
  */
 
 #define rt_mutex_deadlock_check(l)			(0)
-#define rt_mutex_deadlock_account_lock(m, t)		do { } while (0)
-#define rt_mutex_deadlock_account_unlock(l)		do { } while (0)
 #define debug_rt_mutex_init_waiter(w)			do { } while (0)
 #define debug_rt_mutex_free_waiter(w)			do { } while (0)
 #define debug_rt_mutex_lock(l)				do { } while (0)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -v4 06/10] futex,rt_mutex: Provide futex specific rt_mutex API
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
                   ` (4 preceding siblings ...)
  2016-12-13  8:36 ` [PATCH -v4 05/10] futex: Remove rt_mutex_deadlock_account_*() Peter Zijlstra
@ 2016-12-13  8:36 ` Peter Zijlstra
  2016-12-13  8:36 ` [PATCH -v4 07/10] futex: Change locking Peter Zijlstra
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz

[-- Attachment #1: peterz-futex-pi-unlock-1.patch --]
[-- Type: text/plain, Size: 6619 bytes --]

Part of what makes futex_unlock_pi() intricate is that
rt_mutex_futex_unlock() -> rt_mutex_slowunlock() can drop
rt_mutex::wait_lock.

This means we cannot rely on the atomicy of wait_lock, which we would
like to do in order to not rely on hb->lock so much.

The reason rt_mutex_slowunlock() needs to drop wait_lock is because it
can race with the rt_mutex fastpath, however futexes have their own
fast path.

Since futexes already have a bunch of separate rt_mutex accessors,
complete that set and implement a rt_mutex variant without fastpath
for them.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex.c                  |   30 ++++++++++-----------
 kernel/locking/rtmutex.c        |   55 +++++++++++++++++++++++++++++-----------
 kernel/locking/rtmutex_common.h |    9 +++++-
 3 files changed, 62 insertions(+), 32 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -914,7 +914,7 @@ void exit_pi_state_list(struct task_stru
 		pi_state->owner = NULL;
 		raw_spin_unlock_irq(&curr->pi_lock);
 
-		rt_mutex_unlock(&pi_state->pi_mutex);
+		rt_mutex_futex_unlock(&pi_state->pi_mutex);
 
 		spin_unlock(&hb->lock);
 
@@ -1362,20 +1362,18 @@ static int wake_futex_pi(u32 __user *uad
 	pi_state->owner = new_owner;
 	raw_spin_unlock(&new_owner->pi_lock);
 
-	raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
-
-	deboost = rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q);
-
 	/*
-	 * First unlock HB so the waiter does not spin on it once he got woken
-	 * up. Second wake up the waiter before the priority is adjusted. If we
-	 * deboost first (and lose our higher priority), then the task might get
-	 * scheduled away before the wake up can take place.
+	 * We've updated the uservalue, this unlock cannot fail.
 	 */
+	deboost = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q);
+
+	raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
 	spin_unlock(&hb->lock);
-	wake_up_q(&wake_q);
-	if (deboost)
+
+	if (deboost) {
+		wake_up_q(&wake_q);
 		rt_mutex_adjust_prio(current);
+	}
 
 	return 0;
 }
@@ -2251,7 +2249,7 @@ static int fixup_owner(u32 __user *uaddr
 		 * task acquired the rt_mutex after we removed ourself from the
 		 * rt_mutex waiters list.
 		 */
-		if (rt_mutex_trylock(&q->pi_state->pi_mutex)) {
+		if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
 			locked = 1;
 			goto out;
 		}
@@ -2566,7 +2564,7 @@ static int futex_lock_pi(u32 __user *uad
 	if (!trylock) {
 		ret = rt_mutex_timed_futex_lock(&q.pi_state->pi_mutex, to);
 	} else {
-		ret = rt_mutex_trylock(&q.pi_state->pi_mutex);
+		ret = rt_mutex_futex_trylock(&q.pi_state->pi_mutex);
 		/* Fixup the trylock return value: */
 		ret = ret ? 0 : -EWOULDBLOCK;
 	}
@@ -2589,7 +2587,7 @@ static int futex_lock_pi(u32 __user *uad
 	 * it and return the fault to userspace.
 	 */
 	if (ret && (rt_mutex_owner(&q.pi_state->pi_mutex) == current))
-		rt_mutex_unlock(&q.pi_state->pi_mutex);
+		rt_mutex_futex_unlock(&q.pi_state->pi_mutex);
 
 	/* Unqueue and drop the lock */
 	unqueue_me_pi(&q);
@@ -2896,7 +2894,7 @@ static int futex_wait_requeue_pi(u32 __u
 			spin_lock(q.lock_ptr);
 			ret = fixup_pi_state_owner(uaddr2, &q, current);
 			if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current)
-				rt_mutex_unlock(&q.pi_state->pi_mutex);
+				rt_mutex_futex_unlock(&q.pi_state->pi_mutex);
 			/*
 			 * Drop the reference to the pi state which
 			 * the requeue_pi() code acquired for us.
@@ -2936,7 +2934,7 @@ static int futex_wait_requeue_pi(u32 __u
 		 * userspace.
 		 */
 		if (ret && rt_mutex_owner(pi_mutex) == current)
-			rt_mutex_unlock(pi_mutex);
+			rt_mutex_futex_unlock(pi_mutex);
 
 		/* Unqueue and drop the lock. */
 		unqueue_me_pi(&q);
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1486,15 +1486,23 @@ EXPORT_SYMBOL_GPL(rt_mutex_lock_interrup
 
 /*
  * Futex variant with full deadlock detection.
+ * Futex variants must not use the fast-path, see __rt_mutex_futex_unlock().
  */
-int rt_mutex_timed_futex_lock(struct rt_mutex *lock,
+int __sched rt_mutex_timed_futex_lock(struct rt_mutex *lock,
 			      struct hrtimer_sleeper *timeout)
 {
 	might_sleep();
 
-	return rt_mutex_timed_fastlock(lock, TASK_INTERRUPTIBLE, timeout,
-				       RT_MUTEX_FULL_CHAINWALK,
-				       rt_mutex_slowlock);
+	return rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE,
+				 timeout, RT_MUTEX_FULL_CHAINWALK);
+}
+
+/*
+ * Futex variant, must not use fastpath.
+ */
+int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+	return rt_mutex_slowtrylock(lock);
 }
 
 /**
@@ -1553,19 +1561,38 @@ void __sched rt_mutex_unlock(struct rt_m
 EXPORT_SYMBOL_GPL(rt_mutex_unlock);
 
 /**
- * rt_mutex_futex_unlock - Futex variant of rt_mutex_unlock
- * @lock: the rt_mutex to be unlocked
- *
- * Returns: true/false indicating whether priority adjustment is
- * required or not.
+ * Futex variant, that since futex variants do not use the fast-path, can be
+ * simple and will not need to retry.
  */
-bool __sched rt_mutex_futex_unlock(struct rt_mutex *lock,
-				   struct wake_q_head *wqh)
+bool __sched __rt_mutex_futex_unlock(struct rt_mutex *lock,
+				    struct wake_q_head *wake_q)
+{
+	lockdep_assert_held(&lock->wait_lock);
+
+	debug_rt_mutex_unlock(lock);
+
+	if (!rt_mutex_has_waiters(lock)) {
+		lock->owner = NULL;
+		return false; /* done */
+	}
+
+	mark_wakeup_next_waiter(wake_q, lock);
+	return true; /* deboost and wakeups */
+}
+
+void __sched rt_mutex_futex_unlock(struct rt_mutex *lock)
 {
-	if (likely(rt_mutex_cmpxchg_release(lock, current, NULL)))
-		return false;
+	DEFINE_WAKE_Q(wake_q);
+	bool deboost;
 
-	return rt_mutex_slowunlock(lock, wqh);
+	raw_spin_lock_irq(&lock->wait_lock);
+	deboost = __rt_mutex_futex_unlock(lock, &wake_q);
+	raw_spin_unlock_irq(&lock->wait_lock);
+
+	if (deboost) {
+		wake_up_q(&wake_q);
+		rt_mutex_adjust_prio(current);
+	}
 }
 
 /**
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -108,9 +108,14 @@ extern int rt_mutex_start_proxy_lock(str
 extern int rt_mutex_finish_proxy_lock(struct rt_mutex *lock,
 				      struct hrtimer_sleeper *to,
 				      struct rt_mutex_waiter *waiter);
+
 extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to);
-extern bool rt_mutex_futex_unlock(struct rt_mutex *lock,
-				  struct wake_q_head *wqh);
+extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+
+extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
+extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,
+				 struct wake_q_head *wqh);
+
 extern void rt_mutex_adjust_prio(struct task_struct *task);
 
 #ifdef CONFIG_DEBUG_RT_MUTEXES

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -v4 07/10] futex: Change locking
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
                   ` (5 preceding siblings ...)
  2016-12-13  8:36 ` [PATCH -v4 06/10] futex,rt_mutex: Provide futex specific rt_mutex API Peter Zijlstra
@ 2016-12-13  8:36 ` Peter Zijlstra
  2016-12-13  8:36 ` [PATCH -v4 08/10] futex: Rework futex_lock_pi() vs rt_mutex_timed_futex_lock() Peter Zijlstra
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz

[-- Attachment #1: peterz-futex-pi-unlock-2.patch --]
[-- Type: text/plain, Size: 9937 bytes --]

Currently futex-pi relies on hb->lock to serialize everything. Since
hb->lock is giving us problems (PI inversions among other things,
since on -rt hb lock itself is a rt_mutex), we want to break this up a
bit.

This patch reworks and documents the locking. Notably, it
consistently uses rt_mutex::wait_lock to serialize {uval, pi_state}.
This would allow us to do rt_mutex_unlock() (including deboost)
without holding hb->lock.

Nothing yet relies on the new locking rules.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex.c |  153 +++++++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 123 insertions(+), 30 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -971,6 +971,39 @@ void exit_pi_state_list(struct task_stru
  *
  * [10] There is no transient state which leaves owner and user space
  *	TID out of sync.
+ *
+ *
+ * Serialization and lifetime rules:
+ *
+ * hb->lock:
+ *
+ *	hb -> futex_q, relation
+ *	futex_q -> pi_state, relation
+ *
+ *	(cannot be raw because hb can contain arbitrary amount
+ *	 of futex_q's)
+ *
+ * pi_mutex->wait_lock:
+ *
+ *	{uval, pi_state}
+ *
+ *	(and pi_mutex 'obviously')
+ *
+ * p->pi_lock:
+ *
+ *	p->pi_state_list -> pi_state->list, relation
+ *
+ * pi_state->refcount:
+ *
+ *	pi_state lifetime
+ *
+ *
+ * Lock order:
+ *
+ *   hb->lock
+ *     pi_mutex->wait_lock
+ *       p->pi_lock
+ *
  */
 
 /*
@@ -978,10 +1011,12 @@ void exit_pi_state_list(struct task_stru
  * the pi_state against the user space value. If correct, attach to
  * it.
  */
-static int attach_to_pi_state(u32 uval, struct futex_pi_state *pi_state,
+static int attach_to_pi_state(u32 __user *uaddr, u32 uval,
+			      struct futex_pi_state *pi_state,
 			      struct futex_pi_state **ps)
 {
 	pid_t pid = uval & FUTEX_TID_MASK;
+	int ret, uval2;
 
 	/*
 	 * Userspace might have messed up non-PI and PI futexes [3]
@@ -989,9 +1024,34 @@ static int attach_to_pi_state(u32 uval,
 	if (unlikely(!pi_state))
 		return -EINVAL;
 
+	/*
+	 * We get here with hb->lock held, and having found a
+	 * futex_top_waiter(). This means that futex_lock_pi() of said futex_q
+	 * has dropped the hb->lock in between queue_me() and unqueue_me_pi(),
+	 * which in turn means that futex_lock_pi() still has a reference on
+	 * our pi_state.
+	 */
 	WARN_ON(!atomic_read(&pi_state->refcount));
 
 	/*
+	 * Now that we have a pi_state, we can acquire wait_lock
+	 * and do the state validation.
+	 */
+	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
+
+	/*
+	 * Since {uval, pi_state} is serialized by wait_lock, and our current
+	 * uval was read without holding it, it can have changed. Verify it
+	 * still is what we expect it to be, otherwise retry the entire
+	 * operation.
+	 */
+	if (get_futex_value_locked(&uval2, uaddr))
+		goto out_efault;
+
+	if (uval != uval2)
+		goto out_eagain;
+
+	/*
 	 * Handle the owner died case:
 	 */
 	if (uval & FUTEX_OWNER_DIED) {
@@ -1006,11 +1066,11 @@ static int attach_to_pi_state(u32 uval,
 			 * is not 0. Inconsistent state. [5]
 			 */
 			if (pid)
-				return -EINVAL;
+				goto out_einval;
 			/*
 			 * Take a ref on the state and return success. [4]
 			 */
-			goto out_state;
+			goto out_attach;
 		}
 
 		/*
@@ -1022,14 +1082,14 @@ static int attach_to_pi_state(u32 uval,
 		 * Take a ref on the state and return success. [6]
 		 */
 		if (!pid)
-			goto out_state;
+			goto out_attach;
 	} else {
 		/*
 		 * If the owner died bit is not set, then the pi_state
 		 * must have an owner. [7]
 		 */
 		if (!pi_state->owner)
-			return -EINVAL;
+			goto out_einval;
 	}
 
 	/*
@@ -1038,11 +1098,29 @@ static int attach_to_pi_state(u32 uval,
 	 * user space TID. [9/10]
 	 */
 	if (pid != task_pid_vnr(pi_state->owner))
-		return -EINVAL;
-out_state:
+		goto out_einval;
+
+out_attach:
 	atomic_inc(&pi_state->refcount);
+	raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
 	*ps = pi_state;
 	return 0;
+
+out_einval:
+	ret = -EINVAL;
+	goto out_error;
+
+out_eagain:
+	ret = -EAGAIN;
+	goto out_error;
+
+out_efault:
+	ret = -EFAULT;
+	goto out_error;
+
+out_error:
+	raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
+	return ret;
 }
 
 /*
@@ -1093,6 +1171,9 @@ static int attach_to_pi_owner(u32 uval,
 
 	/*
 	 * No existing pi state. First waiter. [2]
+	 *
+	 * This creates pi_state, we have hb->lock held, this means nothing can
+	 * observe this state, wait_lock is irrelevant.
 	 */
 	pi_state = alloc_pi_state();
 
@@ -1117,7 +1198,8 @@ static int attach_to_pi_owner(u32 uval,
 	return 0;
 }
 
-static int lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
+static int lookup_pi_state(u32 __user *uaddr, u32 uval,
+			   struct futex_hash_bucket *hb,
 			   union futex_key *key, struct futex_pi_state **ps)
 {
 	struct futex_q *top_waiter = futex_top_waiter(hb, key);
@@ -1127,7 +1209,7 @@ static int lookup_pi_state(u32 uval, str
 	 * attach to the pi_state when the validation succeeds.
 	 */
 	if (top_waiter)
-		return attach_to_pi_state(uval, top_waiter->pi_state, ps);
+		return attach_to_pi_state(uaddr, uval, top_waiter->pi_state, ps);
 
 	/*
 	 * We are the first waiter - try to look up the owner based on
@@ -1146,7 +1228,7 @@ static int lock_pi_update_atomic(u32 __u
 	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
 		return -EFAULT;
 
-	/*If user space value changed, let the caller retry */
+	/* If user space value changed, let the caller retry */
 	return curval != uval ? -EAGAIN : 0;
 }
 
@@ -1202,7 +1284,7 @@ static int futex_lock_pi_atomic(u32 __us
 	 */
 	top_waiter = futex_top_waiter(hb, key);
 	if (top_waiter)
-		return attach_to_pi_state(uval, top_waiter->pi_state, ps);
+		return attach_to_pi_state(uaddr, uval, top_waiter->pi_state, ps);
 
 	/*
 	 * No waiter and user TID is 0. We are here because the
@@ -1334,6 +1416,7 @@ static int wake_futex_pi(u32 __user *uad
 
 	if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)) {
 		ret = -EFAULT;
+
 	} else if (curval != uval) {
 		/*
 		 * If a unconditional UNLOCK_PI operation (user space did not
@@ -1346,6 +1429,7 @@ static int wake_futex_pi(u32 __user *uad
 		else
 			ret = -EINVAL;
 	}
+
 	if (ret) {
 		raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
 		return ret;
@@ -1821,7 +1905,7 @@ static int futex_requeue(u32 __user *uad
 			 * If that call succeeds then we have pi_state and an
 			 * initial refcount on it.
 			 */
-			ret = lookup_pi_state(ret, hb2, &key2, &pi_state);
+			ret = lookup_pi_state(uaddr2, ret, hb2, &key2, &pi_state);
 		}
 
 		switch (ret) {
@@ -2120,10 +2204,13 @@ static int fixup_pi_state_owner(u32 __us
 {
 	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
 	struct futex_pi_state *pi_state = q->pi_state;
-	struct task_struct *oldowner = pi_state->owner;
 	u32 uval, uninitialized_var(curval), newval;
+	struct task_struct *oldowner;
 	int ret;
 
+	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
+
+	oldowner = pi_state->owner;
 	/* Owner died? */
 	if (!pi_state->owner)
 		newtid |= FUTEX_OWNER_DIED;
@@ -2139,11 +2226,10 @@ static int fixup_pi_state_owner(u32 __us
 	 * because we can fault here. Imagine swapped out pages or a fork
 	 * that marked all the anonymous memory readonly for cow.
 	 *
-	 * Modifying pi_state _before_ the user space value would
-	 * leave the pi_state in an inconsistent state when we fault
-	 * here, because we need to drop the hash bucket lock to
-	 * handle the fault. This might be observed in the PID check
-	 * in lookup_pi_state.
+	 * Modifying pi_state _before_ the user space value would leave the
+	 * pi_state in an inconsistent state when we fault here, because we
+	 * need to drop the locks to handle the fault. This might be observed
+	 * in the PID check in lookup_pi_state.
 	 */
 retry:
 	if (get_futex_value_locked(&uval, uaddr))
@@ -2164,36 +2250,43 @@ static int fixup_pi_state_owner(u32 __us
 	 * itself.
 	 */
 	if (pi_state->owner != NULL) {
-		raw_spin_lock_irq(&pi_state->owner->pi_lock);
+		raw_spin_lock(&pi_state->owner->pi_lock);
 		WARN_ON(list_empty(&pi_state->list));
 		list_del_init(&pi_state->list);
-		raw_spin_unlock_irq(&pi_state->owner->pi_lock);
+		raw_spin_unlock(&pi_state->owner->pi_lock);
 	}
 
 	pi_state->owner = newowner;
 
-	raw_spin_lock_irq(&newowner->pi_lock);
+	raw_spin_lock(&newowner->pi_lock);
 	WARN_ON(!list_empty(&pi_state->list));
 	list_add(&pi_state->list, &newowner->pi_state_list);
-	raw_spin_unlock_irq(&newowner->pi_lock);
+	raw_spin_unlock(&newowner->pi_lock);
+	raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
+
 	return 0;
 
 	/*
-	 * To handle the page fault we need to drop the hash bucket
-	 * lock here. That gives the other task (either the highest priority
-	 * waiter itself or the task which stole the rtmutex) the
-	 * chance to try the fixup of the pi_state. So once we are
-	 * back from handling the fault we need to check the pi_state
-	 * after reacquiring the hash bucket lock and before trying to
-	 * do another fixup. When the fixup has been done already we
-	 * simply return.
+	 * To handle the page fault we need to drop the locks here. That gives
+	 * the other task (either the highest priority waiter itself or the
+	 * task which stole the rtmutex) the chance to try the fixup of the
+	 * pi_state. So once we are back from handling the fault we need to
+	 * check the pi_state after reacquiring the locks and before trying to
+	 * do another fixup. When the fixup has been done already we simply
+	 * return.
+	 *
+	 * Note: we hold both hb->lock and pi_mutex->wait_lock. We can safely
+	 * drop hb->lock since the caller owns the hb -> futex_q relation.
+	 * Dropping the pi_mutex->wait_lock requires the state revalidate.
 	 */
 handle_fault:
+	raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
 	spin_unlock(q->lock_ptr);
 
 	ret = fault_in_user_writeable(uaddr);
 
 	spin_lock(q->lock_ptr);
+	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 
 	/*
 	 * Check if someone else fixed it for us:

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -v4 08/10] futex: Rework futex_lock_pi() vs rt_mutex_timed_futex_lock()
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
                   ` (6 preceding siblings ...)
  2016-12-13  8:36 ` [PATCH -v4 07/10] futex: Change locking Peter Zijlstra
@ 2016-12-13  8:36 ` Peter Zijlstra
  2016-12-13  8:36 ` [PATCH -v4 09/10] futex: Remove inconsistent hb/rt_mutex state magic Peter Zijlstra
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz

[-- Attachment #1: peterz-futex-pi-unlock-5.patch --]
[-- Type: text/plain, Size: 9647 bytes --]

Currently we must drop hb->lock in order to call
rt_mutex_timed_futex_lock(), such that it can block and wait for lock
acquisition.

The problem with this is that at this point (and after a failed
acquire but before re-acquiring hb->lock) the hb queue and rt_mutex
wait list disagree on who is waiting to acquire the futex. This leads
to a 'funny' state in wake_futex_pi() which we then need to deal with
in fixup_owner().

This inconsistent state would become even more of a problem when we
move rt_mutex_unlock() out from under hb->lock, which we want to do
because of PI inversion issues.

Avoid the funny state by reusing the
rt_mutex_{start,finish}_proxy_lock() primitives that were created for
requeue-pi.

The idea is to enqueue the rt_mutex_waiter on the rt_mutex wait list
before dropping the hb->lock, this is what rt_mutex_start_proxy_lock()
allows.

We must then wait for the lock with hb->lock dropped, _however_ we
must not do remove_waiter() in case the lock acquire fails, this means
we need to split rt_mutex_finish_proxy_lock() into two parts: wait and
cleanup.

With this done we can do something along the lines of:

	__queue_me();
	rt_mutex_start_proxy_lock();
	spin_unlock(q.lock_ptr);

	ret = rt_mutex_wait_proxy_lock();

	spin_lock(q.lock_ptr);
	if (ret)
		rt_mutex_cleanup_proxy_lock();

And the hb queue and rt_mutex wait lists are always in sync, with the
one except of rt_mutex_futex_unlock(), which removes the new owner
from the rt_mutex wait list which will remain on the hb queue until it
returns from rt_mutex_wait_proxy_lock().

This one case is fine, since unlock has no concurrency with itself.
That is, the new owner must finish and return from futex_lock_pi()
before it can do another unlock (through futex_unlock_pi()), at which
point its state is coherent again.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex.c                  |   78 ++++++++++++++++++++++++++++------------
 kernel/locking/rtmutex.c        |   47 +++++++++++++-----------
 kernel/locking/rtmutex_common.h |   10 +++--
 3 files changed, 88 insertions(+), 47 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2088,20 +2088,7 @@ queue_unlock(struct futex_hash_bucket *h
 	hb_waiters_dec(hb);
 }
 
-/**
- * queue_me() - Enqueue the futex_q on the futex_hash_bucket
- * @q:	The futex_q to enqueue
- * @hb:	The destination hash bucket
- *
- * The hb->lock must be held by the caller, and is released here. A call to
- * queue_me() is typically paired with exactly one call to unqueue_me().  The
- * exceptions involve the PI related operations, which may use unqueue_me_pi()
- * or nothing if the unqueue is done as part of the wake process and the unqueue
- * state is implicit in the state of woken task (see futex_wait_requeue_pi() for
- * an example).
- */
-static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
-	__releases(&hb->lock)
+static inline void __queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
 {
 	int prio;
 
@@ -2118,6 +2105,24 @@ static inline void queue_me(struct futex
 	plist_node_init(&q->list, prio);
 	plist_add(&q->list, &hb->chain);
 	q->task = current;
+}
+
+/**
+ * queue_me() - Enqueue the futex_q on the futex_hash_bucket
+ * @q:	The futex_q to enqueue
+ * @hb:	The destination hash bucket
+ *
+ * The hb->lock must be held by the caller, and is released here. A call to
+ * queue_me() is typically paired with exactly one call to unqueue_me().  The
+ * exceptions involve the PI related operations, which may use unqueue_me_pi()
+ * or nothing if the unqueue is done as part of the wake process and the unqueue
+ * state is implicit in the state of woken task (see futex_wait_requeue_pi() for
+ * an example).
+ */
+static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
+	__releases(&hb->lock)
+{
+	__queue_me(q, hb);
 	spin_unlock(&hb->lock);
 }
 
@@ -2593,6 +2598,7 @@ static int futex_lock_pi(u32 __user *uad
 			 ktime_t *time, int trylock)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
+	struct rt_mutex_waiter rt_waiter;
 	struct futex_hash_bucket *hb;
 	struct futex_q q = futex_q_init;
 	int res, ret;
@@ -2645,25 +2651,49 @@ static int futex_lock_pi(u32 __user *uad
 		}
 	}
 
+	WARN_ON(!q.pi_state);
+
 	/*
 	 * Only actually queue now that the atomic ops are done:
 	 */
-	queue_me(&q, hb);
+	__queue_me(&q, hb);
 
-	WARN_ON(!q.pi_state);
-	/*
-	 * Block on the PI mutex:
-	 */
-	if (!trylock) {
-		ret = rt_mutex_timed_futex_lock(&q.pi_state->pi_mutex, to);
-	} else {
+	debug_rt_mutex_init_waiter(&rt_waiter);
+	RB_CLEAR_NODE(&rt_waiter.pi_tree_entry);
+	RB_CLEAR_NODE(&rt_waiter.tree_entry);
+	rt_waiter.task = NULL;
+
+	if (trylock) {
 		ret = rt_mutex_futex_trylock(&q.pi_state->pi_mutex);
 		/* Fixup the trylock return value: */
 		ret = ret ? 0 : -EWOULDBLOCK;
+		goto did_trylock;
 	}
 
+	/*
+	 * We must add ourselves to the rt_mutex waitlist while holding hb->lock
+	 * such that the hb and rt_mutex wait lists match.
+	 */
+	rt_mutex_start_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter, current);
+	spin_unlock(q.lock_ptr);
+
+	if (unlikely(to))
+		hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS);
+
+	ret = rt_mutex_wait_proxy_lock(&q.pi_state->pi_mutex, to, &rt_waiter);
+
 	spin_lock(q.lock_ptr);
 	/*
+	 * If we failed to acquire the lock (signal/timeout), we must
+	 * first acquire the hb->lock before removing the lock from the
+	 * rt_mutex waitqueue, such that we can keep the hb and rt_mutex
+	 * wait lists consistent.
+	 */
+	if (ret)
+		rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter);
+
+did_trylock:
+	/*
 	 * Fixup the pi_state owner and possibly acquire the lock if we
 	 * haven't already.
 	 */
@@ -3005,10 +3035,12 @@ static int futex_wait_requeue_pi(u32 __u
 		 */
 		WARN_ON(!q.pi_state);
 		pi_mutex = &q.pi_state->pi_mutex;
-		ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter);
+		ret = rt_mutex_wait_proxy_lock(pi_mutex, to, &rt_waiter);
 		debug_rt_mutex_free_waiter(&rt_waiter);
 
 		spin_lock(q.lock_ptr);
+		if (ret)
+			rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter);
 		/*
 		 * Fixup the pi_state owner and possibly acquire the lock if we
 		 * haven't already.
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1485,19 +1485,6 @@ int __sched rt_mutex_lock_interruptible(
 EXPORT_SYMBOL_GPL(rt_mutex_lock_interruptible);
 
 /*
- * Futex variant with full deadlock detection.
- * Futex variants must not use the fast-path, see __rt_mutex_futex_unlock().
- */
-int __sched rt_mutex_timed_futex_lock(struct rt_mutex *lock,
-			      struct hrtimer_sleeper *timeout)
-{
-	might_sleep();
-
-	return rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE,
-				 timeout, RT_MUTEX_FULL_CHAINWALK);
-}
-
-/*
  * Futex variant, must not use fastpath.
  */
 int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
@@ -1745,21 +1732,23 @@ struct task_struct *rt_mutex_next_owner(
 }
 
 /**
- * rt_mutex_finish_proxy_lock() - Complete lock acquisition
+ * rt_mutex_wait_proxy_lock() - Wait for lock acquisition
  * @lock:		the rt_mutex we were woken on
  * @to:			the timeout, null if none. hrtimer should already have
  *			been started.
  * @waiter:		the pre-initialized rt_mutex_waiter
  *
- * Complete the lock acquisition started our behalf by another thread.
+ * Wait for the the lock acquisition started on our behalf by
+ * rt_mutex_start_proxy_lock(). Upon failure, the caller must call
+ * rt_mutex_cleanup_proxy_lock().
  *
  * Returns:
  *  0 - success
  * <0 - error, one of -EINTR, -ETIMEDOUT
  *
- * Special API call for PI-futex requeue support
+ * Special API call for PI-futex support
  */
-int rt_mutex_finish_proxy_lock(struct rt_mutex *lock,
+int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
 			       struct hrtimer_sleeper *to,
 			       struct rt_mutex_waiter *waiter)
 {
@@ -1772,9 +1761,6 @@ int rt_mutex_finish_proxy_lock(struct rt
 	/* sleep on the mutex */
 	ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter);
 
-	if (unlikely(ret))
-		remove_waiter(lock, waiter);
-
 	/*
 	 * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
 	 * have to fix that up.
@@ -1785,3 +1771,24 @@ int rt_mutex_finish_proxy_lock(struct rt
 
 	return ret;
 }
+
+/**
+ * rt_mutex_cleanup_proxy_lock() - Cleanup failed lock acquisition
+ * @lock:		the rt_mutex we were woken on
+ * @waiter:		the pre-initialized rt_mutex_waiter
+ *
+ * Clean up the failed lock acquisition as per rt_mutex_wait_proxy_lock().
+ *
+ * Special API call for PI-futex support
+ */
+void rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
+				 struct rt_mutex_waiter *waiter)
+{
+	raw_spin_lock_irq(&lock->wait_lock);
+
+	remove_waiter(lock, waiter);
+	fixup_rt_mutex_waiters(lock);
+
+	raw_spin_unlock_irq(&lock->wait_lock);
+}
+
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -105,11 +105,13 @@ extern void rt_mutex_proxy_unlock(struct
 extern int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
 				     struct rt_mutex_waiter *waiter,
 				     struct task_struct *task);
-extern int rt_mutex_finish_proxy_lock(struct rt_mutex *lock,
-				      struct hrtimer_sleeper *to,
-				      struct rt_mutex_waiter *waiter);
 
-extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to);
+extern int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
+			       struct hrtimer_sleeper *to,
+			       struct rt_mutex_waiter *waiter);
+extern void rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
+				 struct rt_mutex_waiter *waiter);
+
 extern int rt_mutex_futex_trylock(struct rt_mutex *l);
 
 extern void rt_mutex_futex_unlock(struct rt_mutex *lock);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -v4 09/10] futex: Remove inconsistent hb/rt_mutex state magic
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
                   ` (7 preceding siblings ...)
  2016-12-13  8:36 ` [PATCH -v4 08/10] futex: Rework futex_lock_pi() vs rt_mutex_timed_futex_lock() Peter Zijlstra
@ 2016-12-13  8:36 ` Peter Zijlstra
  2016-12-13  8:36 ` [PATCH -v4 10/10] futex: Pull rt_mutex_futex_unlock() out from under hb->lock Peter Zijlstra
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz

[-- Attachment #1: peterz-futex-pi-unlock-6.patch --]
[-- Type: text/plain, Size: 2139 bytes --]

Now that we guarantee hb queue and rt_mutex waiter state match up, we
no longer need to deal with the fallout of when they don't.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex.c |   38 +-------------------------------------
 1 file changed, 1 insertion(+), 37 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1396,13 +1396,7 @@ static int wake_futex_pi(u32 __user *uad
 	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 	new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
 
-	/*
-	 * It is possible that the next waiter (the one that brought
-	 * top_waiter owner to the kernel) timed out and is no longer
-	 * waiting on the lock.
-	 */
-	if (!new_owner)
-		new_owner = top_waiter->task;
+	BUG_ON(!new_owner);
 
 	/*
 	 * We pass it to the next owner. The WAITERS bit is always
@@ -2324,7 +2318,6 @@ static long futex_wait_restart(struct re
  */
 static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
 {
-	struct task_struct *owner;
 	int ret = 0;
 
 	if (locked) {
@@ -2337,35 +2330,6 @@ static int fixup_owner(u32 __user *uaddr
 		goto out;
 	}
 
-	/*
-	 * Catch the rare case, where the lock was released when we were on the
-	 * way back before we locked the hash bucket.
-	 */
-	if (q->pi_state->owner == current) {
-		/*
-		 * Try to get the rt_mutex now. This might fail as some other
-		 * task acquired the rt_mutex after we removed ourself from the
-		 * rt_mutex waiters list.
-		 */
-		if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
-			locked = 1;
-			goto out;
-		}
-
-		/*
-		 * pi_state is incorrect, some other task did a lock steal and
-		 * we returned due to timeout or signal without taking the
-		 * rt_mutex. Too late.
-		 */
-		raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
-		owner = rt_mutex_owner(&q->pi_state->pi_mutex);
-		if (!owner)
-			owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
-		raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
-		ret = fixup_pi_state_owner(uaddr, q, owner);
-		goto out;
-	}
-
 	/*
 	 * Paranoia check. If we did not take the lock, then we should not be
 	 * the owner of the rt_mutex.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH -v4 10/10] futex: Pull rt_mutex_futex_unlock() out from under hb->lock
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
                   ` (8 preceding siblings ...)
  2016-12-13  8:36 ` [PATCH -v4 09/10] futex: Remove inconsistent hb/rt_mutex state magic Peter Zijlstra
@ 2016-12-13  8:36 ` Peter Zijlstra
  2016-12-13 16:07 ` [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
  2016-12-16 23:31 ` Darren Hart
  11 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13  8:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart, peterz

[-- Attachment #1: peterz-futex-pi-unlock-3.patch --]
[-- Type: text/plain, Size: 11235 bytes --]

There's a number of 'interesting' problems, all caused by holding
hb->lock while doing the rt_mutex_unlock() equivalient.

Notably:

 - a PI inversion on hb->lock; and,

 - a DL crash because of pointer instability.

Because of all the previous patches that:

 - allow us to do rt_mutex_futex_unlock() without dropping wait_lock;
   which in turn allows us to rely on wait_lock atomicy.

 - changed locking rules to cover {uval,pi_state} with wait_lock.

 - removed the hb queue vs rt_mutex waiters inconsistency.

We can now quite simply pull rt_mutex_futex_unlock() out from under
hb->lock, a pi_state reference and wait_lock are sufficient.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/futex.c |  139 +++++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 92 insertions(+), 47 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -800,7 +800,7 @@ static int refill_pi_state_cache(void)
 	return 0;
 }
 
-static struct futex_pi_state * alloc_pi_state(void)
+static struct futex_pi_state *alloc_pi_state(void)
 {
 	struct futex_pi_state *pi_state = current->pi_state_cache;
 
@@ -810,6 +810,11 @@ static struct futex_pi_state * alloc_pi_
 	return pi_state;
 }
 
+static void get_pi_state(struct futex_pi_state *pi_state)
+{
+	WARN_ON_ONCE(!atomic_inc_not_zero(&pi_state->refcount));
+}
+
 /*
  * Drops a reference to the pi_state object and frees or caches it
  * when the last reference is gone.
@@ -854,7 +859,7 @@ static void put_pi_state(struct futex_pi
  * Look up the task based on what TID userspace gave us.
  * We dont trust it.
  */
-static struct task_struct * futex_find_get_task(pid_t pid)
+static struct task_struct *futex_find_get_task(pid_t pid)
 {
 	struct task_struct *p;
 
@@ -914,10 +919,12 @@ void exit_pi_state_list(struct task_stru
 		pi_state->owner = NULL;
 		raw_spin_unlock_irq(&curr->pi_lock);
 
-		rt_mutex_futex_unlock(&pi_state->pi_mutex);
-
+		get_pi_state(pi_state);
 		spin_unlock(&hb->lock);
 
+		rt_mutex_futex_unlock(&pi_state->pi_mutex);
+		put_pi_state(pi_state);
+
 		raw_spin_lock_irq(&curr->pi_lock);
 	}
 	raw_spin_unlock_irq(&curr->pi_lock);
@@ -1030,6 +1037,9 @@ static int attach_to_pi_state(u32 __user
 	 * has dropped the hb->lock in between queue_me() and unqueue_me_pi(),
 	 * which in turn means that futex_lock_pi() still has a reference on
 	 * our pi_state.
+	 *
+	 * IOW, we cannot race against the unlocked put_pi_state() in
+	 * futex_unlock_pi().
 	 */
 	WARN_ON(!atomic_read(&pi_state->refcount));
 
@@ -1101,7 +1111,7 @@ static int attach_to_pi_state(u32 __user
 		goto out_einval;
 
 out_attach:
-	atomic_inc(&pi_state->refcount);
+	get_pi_state(pi_state);
 	raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
 	*ps = pi_state;
 	return 0;
@@ -1373,35 +1383,22 @@ static void mark_wake_futex(struct wake_
 	smp_store_release(&q->lock_ptr, NULL);
 }
 
-static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *top_waiter,
-			 struct futex_hash_bucket *hb)
+static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_state)
 {
-	struct task_struct *new_owner;
-	struct futex_pi_state *pi_state = top_waiter->pi_state;
 	u32 uninitialized_var(curval), newval;
+	struct task_struct *new_owner;
+	bool deboost = false;
 	DEFINE_WAKE_Q(wake_q);
-	bool deboost;
 	int ret = 0;
 
-	if (!pi_state)
-		return -EINVAL;
-
-	/*
-	 * If current does not own the pi_state then the futex is
-	 * inconsistent and user space fiddled with the futex value.
-	 */
-	if (pi_state->owner != current)
-		return -EINVAL;
-
 	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 	new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
-
 	BUG_ON(!new_owner);
 
 	/*
-	 * We pass it to the next owner. The WAITERS bit is always
-	 * kept enabled while there is PI state around. We cleanup the
-	 * owner died bit, because we are the owner.
+	 * We pass it to the next owner. The WAITERS bit is always kept
+	 * enabled while there is PI state around. We cleanup the owner
+	 * died bit, because we are the owner.
 	 */
 	newval = FUTEX_WAITERS | task_pid_vnr(new_owner);
 
@@ -1424,10 +1421,8 @@ static int wake_futex_pi(u32 __user *uad
 			ret = -EINVAL;
 	}
 
-	if (ret) {
-		raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
-		return ret;
-	}
+	if (ret)
+		goto out_unlock;
 
 	raw_spin_lock(&pi_state->owner->pi_lock);
 	WARN_ON(list_empty(&pi_state->list));
@@ -1445,15 +1440,15 @@ static int wake_futex_pi(u32 __user *uad
 	 */
 	deboost = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q);
 
+out_unlock:
 	raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
-	spin_unlock(&hb->lock);
 
 	if (deboost) {
 		wake_up_q(&wake_q);
 		rt_mutex_adjust_prio(current);
 	}
 
-	return 0;
+	return ret;
 }
 
 /*
@@ -1982,7 +1977,7 @@ static int futex_requeue(u32 __user *uad
 			 * refcount on the pi_state and store the pointer in
 			 * the futex_q object of the waiter.
 			 */
-			atomic_inc(&pi_state->refcount);
+			get_pi_state(pi_state);
 			this->pi_state = pi_state;
 			ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
 							this->rt_waiter,
@@ -2217,7 +2212,8 @@ static int fixup_pi_state_owner(u32 __us
 	/*
 	 * We are here either because we stole the rtmutex from the
 	 * previous highest priority waiter or we are the highest priority
-	 * waiter but failed to get the rtmutex the first time.
+	 * waiter but have failed to get the rtmutex the first time.
+	 *
 	 * We have to replace the newowner TID in the user space variable.
 	 * This must be atomic as we have to preserve the owner died bit here.
 	 *
@@ -2234,7 +2230,7 @@ static int fixup_pi_state_owner(u32 __us
 	if (get_futex_value_locked(&uval, uaddr))
 		goto handle_fault;
 
-	while (1) {
+	for (;;) {
 		newval = (uval & FUTEX_OWNER_DIED) | newtid;
 
 		if (cmpxchg_futex_value_locked(&curval, uaddr, uval, newval))
@@ -2324,6 +2320,10 @@ static int fixup_owner(u32 __user *uaddr
 		/*
 		 * Got the lock. We might not be the anticipated owner if we
 		 * did a lock-steal - fix up the PI-state in that case:
+		 *
+		 * We can safely read pi_state->owner without holding wait_lock
+		 * because we now own the rt_mutex, only the owner will attempt
+		 * to change it.
 		 */
 		if (q->pi_state->owner != current)
 			ret = fixup_pi_state_owner(uaddr, q, current);
@@ -2334,11 +2334,12 @@ static int fixup_owner(u32 __user *uaddr
 	 * Paranoia check. If we did not take the lock, then we should not be
 	 * the owner of the rt_mutex.
 	 */
-	if (rt_mutex_owner(&q->pi_state->pi_mutex) == current)
+	if (rt_mutex_owner(&q->pi_state->pi_mutex) == current) {
 		printk(KERN_ERR "fixup_owner: ret = %d pi-mutex: %p "
 				"pi-state %p\n", ret,
 				q->pi_state->pi_mutex.owner,
 				q->pi_state->owner);
+	}
 
 out:
 	return ret ? ret : locked;
@@ -2562,6 +2563,7 @@ static int futex_lock_pi(u32 __user *uad
 			 ktime_t *time, int trylock)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
+	struct futex_pi_state *pi_state = NULL;
 	struct rt_mutex_waiter rt_waiter;
 	struct futex_hash_bucket *hb;
 	struct futex_q q = futex_q_init;
@@ -2673,12 +2675,19 @@ static int futex_lock_pi(u32 __user *uad
 	 * If fixup_owner() faulted and was unable to handle the fault, unlock
 	 * it and return the fault to userspace.
 	 */
-	if (ret && (rt_mutex_owner(&q.pi_state->pi_mutex) == current))
-		rt_mutex_futex_unlock(&q.pi_state->pi_mutex);
+	if (ret && (rt_mutex_owner(&q.pi_state->pi_mutex) == current)) {
+		pi_state = q.pi_state;
+		get_pi_state(pi_state);
+	}
 
 	/* Unqueue and drop the lock */
 	unqueue_me_pi(&q);
 
+	if (pi_state) {
+		rt_mutex_futex_unlock(&pi_state->pi_mutex);
+		put_pi_state(pi_state);
+	}
+
 	goto out_put_key;
 
 out_unlock_put_key:
@@ -2741,10 +2750,36 @@ static int futex_unlock_pi(u32 __user *u
 	 */
 	top_waiter = futex_top_waiter(hb, &key);
 	if (top_waiter) {
-		ret = wake_futex_pi(uaddr, uval, top_waiter, hb);
+		struct futex_pi_state *pi_state = top_waiter->pi_state;
+
+		ret = -EINVAL;
+		if (!pi_state)
+			goto out_unlock;
+
+		/*
+		 * If current does not own the pi_state then the futex is
+		 * inconsistent and user space fiddled with the futex value.
+		 */
+		if (pi_state->owner != current)
+			goto out_unlock;
+
+		/*
+		 * Grab a reference on the pi_state and drop hb->lock.
+		 *
+		 * The reference ensures pi_state lives, dropping the hb->lock
+		 * is tricky.. wake_futex_pi() will take rt_mutex::wait_lock to
+		 * close the races against futex_lock_pi(), but in case of
+		 * _any_ fail we'll abort and retry the whole deal.
+		 */
+		get_pi_state(pi_state);
+		spin_unlock(&hb->lock);
+
+		ret = wake_futex_pi(uaddr, uval, pi_state);
+
+		put_pi_state(pi_state);
+
 		/*
-		 * In case of success wake_futex_pi dropped the hash
-		 * bucket lock.
+		 * Success, we're done! No tricky corner cases.
 		 */
 		if (!ret)
 			goto out_putkey;
@@ -2759,7 +2794,6 @@ static int futex_unlock_pi(u32 __user *u
 		 * setting the FUTEX_WAITERS bit. Try again.
 		 */
 		if (ret == -EAGAIN) {
-			spin_unlock(&hb->lock);
 			put_futex_key(&key);
 			goto retry;
 		}
@@ -2767,7 +2801,7 @@ static int futex_unlock_pi(u32 __user *u
 		 * wake_futex_pi has detected invalid state. Tell user
 		 * space.
 		 */
-		goto out_unlock;
+		goto out_putkey;
 	}
 
 	/*
@@ -2777,8 +2811,10 @@ static int futex_unlock_pi(u32 __user *u
 	 * preserve the WAITERS bit not the OWNER_DIED one. We are the
 	 * owner.
 	 */
-	if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))
+	if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0)) {
+		spin_unlock(&hb->lock);
 		goto pi_faulted;
+	}
 
 	/*
 	 * If uval has changed, let user space handle it.
@@ -2792,7 +2828,6 @@ static int futex_unlock_pi(u32 __user *u
 	return ret;
 
 pi_faulted:
-	spin_unlock(&hb->lock);
 	put_futex_key(&key);
 
 	ret = fault_in_user_writeable(uaddr);
@@ -2896,6 +2931,7 @@ static int futex_wait_requeue_pi(u32 __u
 				 u32 __user *uaddr2)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
+	struct futex_pi_state *pi_state = NULL;
 	struct rt_mutex_waiter rt_waiter;
 	struct futex_hash_bucket *hb;
 	union futex_key key2 = FUTEX_KEY_INIT;
@@ -2980,8 +3016,10 @@ static int futex_wait_requeue_pi(u32 __u
 		if (q.pi_state && (q.pi_state->owner != current)) {
 			spin_lock(q.lock_ptr);
 			ret = fixup_pi_state_owner(uaddr2, &q, current);
-			if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current)
-				rt_mutex_futex_unlock(&q.pi_state->pi_mutex);
+			if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current) {
+				pi_state = q.pi_state;
+				get_pi_state(pi_state);
+			}
 			/*
 			 * Drop the reference to the pi state which
 			 * the requeue_pi() code acquired for us.
@@ -3022,13 +3060,20 @@ static int futex_wait_requeue_pi(u32 __u
 		 * the fault, unlock the rt_mutex and return the fault to
 		 * userspace.
 		 */
-		if (ret && rt_mutex_owner(pi_mutex) == current)
-			rt_mutex_futex_unlock(pi_mutex);
+		if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current) {
+			pi_state = q.pi_state;
+			get_pi_state(pi_state);
+		}
 
 		/* Unqueue and drop the lock. */
 		unqueue_me_pi(&q);
 	}
 
+	if (pi_state) {
+		rt_mutex_futex_unlock(&pi_state->pi_mutex);
+		put_pi_state(pi_state);
+	}
+
 	if (ret == -EINTR) {
 		/*
 		 * We've already been requeued, but cannot restart by calling

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
                   ` (9 preceding siblings ...)
  2016-12-13  8:36 ` [PATCH -v4 10/10] futex: Pull rt_mutex_futex_unlock() out from under hb->lock Peter Zijlstra
@ 2016-12-13 16:07 ` Peter Zijlstra
  2017-02-22 11:02   ` Peter Zijlstra
  2016-12-16 23:31 ` Darren Hart
  11 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-13 16:07 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart

On Tue, Dec 13, 2016 at 09:36:38AM +0100, Peter Zijlstra wrote:

> The basic idea is to, like requeue PI, break the rt_mutex_lock() function into
> pieces, such that we can enqueue the waiter while holding hb->lock, wait for
> acquisition without hb->lock and can remove the waiter, on failure, while
> holding hb->lock again.
> 
> That way, when we drop hb->lock to wait, futex and rt_mutex wait state is
> consistent.

And of course, there's a hole in...

There is a point in futex_unlock_pi() where we hold neither hb->lock nor
wait_lock, at that point a futex_lock_pi() that had failed its
rt_mutex_wait_proxy_lock() can sneak in and remove itself, even though
we saw its waiter, recreating a vraiant of the initial problem.

The below plugs the hole, but its rather fragile in that it relies on
overlapping critical sections and the specific detail that we call
rt_mutex_cleanup_proxy_lock() immediately after (re)acquiring hb->lock.

There is another solution, but that's more involved and uglier still.

I'll give it a bit more thought.


---
 kernel/futex.c                  |   36 +++++++++++++++++++++++++-----------
 kernel/locking/rtmutex.c        |   21 ++++++++++++++++++---
 kernel/locking/rtmutex_common.h |    2 +-
 3 files changed, 44 insertions(+), 15 deletions(-)

--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1384,6 +1384,7 @@ static void mark_wake_futex(struct wake_
 }
 
 static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_pi_state *pi_state)
+	__releases(&pi_state->pi_mutex.wait_lock)
 {
 	u32 uninitialized_var(curval), newval;
 	struct task_struct *new_owner;
@@ -1391,7 +1392,8 @@ static int wake_futex_pi(u32 __user *uad
 	DEFINE_WAKE_Q(wake_q);
 	int ret = 0;
 
-	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
+	lockdep_assert_held(&pi_state->pi_mutex.wait_lock);
+
 	new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
 	BUG_ON(!new_owner);
 
@@ -2655,8 +2657,8 @@ static int futex_lock_pi(u32 __user *uad
 	 * rt_mutex waitqueue, such that we can keep the hb and rt_mutex
 	 * wait lists consistent.
 	 */
-	if (ret)
-		rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter);
+	if (ret && !rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter))
+		ret = 0;
 
 did_trylock:
 	/*
@@ -2763,15 +2765,26 @@ static int futex_unlock_pi(u32 __user *u
 		if (pi_state->owner != current)
 			goto out_unlock;
 
+		get_pi_state(pi_state);
+
 		/*
-		 * Grab a reference on the pi_state and drop hb->lock.
+		 * We must grab wait_lock _before_ dropping hb->lock, such that
+		 * the critical sections overlap. Without this there is a hole
+		 * in which futex_lock_pi()'s rt_mutex_wait_proxy_lock() can
+		 * fail, re-acquire the hb->lock and wait_lock and have our
+		 * top_waiter dissapear.
+		 */
+		raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
+		/*
+		 * Now that we have a reference on pi_state and hole wait_lock
+		 * we can drop hb->lock without risk of a waiter dissapearing
+		 * on us.
 		 *
-		 * The reference ensures pi_state lives, dropping the hb->lock
-		 * is tricky.. wake_futex_pi() will take rt_mutex::wait_lock to
-		 * close the races against futex_lock_pi(), but in case of
-		 * _any_ fail we'll abort and retry the whole deal.
+		 * Even if rt_mutex_wait_proxy_lock() fails, us holding
+		 * wait_lock ensures it cannot be removed and the
+		 * rt_mutex_cleanup_proxy_lock() call will find it owns the
+		 * lock anyway.
 		 */
-		get_pi_state(pi_state);
 		spin_unlock(&hb->lock);
 
 		ret = wake_futex_pi(uaddr, uval, pi_state);
@@ -3041,8 +3054,9 @@ static int futex_wait_requeue_pi(u32 __u
 		debug_rt_mutex_free_waiter(&rt_waiter);
 
 		spin_lock(q.lock_ptr);
-		if (ret)
-			rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter);
+		if (ret && !rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter))
+			ret = 0;
+
 		/*
 		 * Fixup the pi_state owner and possibly acquire the lock if we
 		 * haven't already.
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1779,16 +1779,31 @@ int rt_mutex_wait_proxy_lock(struct rt_m
  *
  * Clean up the failed lock acquisition as per rt_mutex_wait_proxy_lock().
  *
+ * Returns:
+ *  true - did cleanup, we done.
+ *  false - we acquired the lock anyway, after rt_mutex_wait_proxy_lock(),
+ *          caller should disregard its return value.
+ *
  * Special API call for PI-futex support
  */
-void rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
+bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
 				 struct rt_mutex_waiter *waiter)
 {
+	bool cleanup = false;
+
 	raw_spin_lock_irq(&lock->wait_lock);
 
-	remove_waiter(lock, waiter);
-	fixup_rt_mutex_waiters(lock);
+	/*
+	 * Check if we got the lock anyway...
+	 */
+	if (rt_mutex_owner(lock) != current) {
+		remove_waiter(lock, waiter);
+		fixup_rt_mutex_waiters(lock);
+		cleanup = true;
+	}
 
 	raw_spin_unlock_irq(&lock->wait_lock);
+
+	return cleanup;
 }
 
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -109,7 +109,7 @@ extern int rt_mutex_start_proxy_lock(str
 extern int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
 			       struct hrtimer_sleeper *to,
 			       struct rt_mutex_waiter *waiter);
-extern void rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
+extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
 				 struct rt_mutex_waiter *waiter);
 
 extern int rt_mutex_futex_trylock(struct rt_mutex *l);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles
  2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
                   ` (10 preceding siblings ...)
  2016-12-13 16:07 ` [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
@ 2016-12-16 23:31 ` Darren Hart
  2016-12-17 13:52   ` Peter Zijlstra
  11 siblings, 1 reply; 28+ messages in thread
From: Darren Hart @ 2016-12-16 23:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot

On Tue, Dec 13, 2016 at 09:36:38AM +0100, Peter Zijlstra wrote:
> 
> Hi all,
> 
> This is (I think) the 4th attempt at fixing this tiny pesky issue with
> FUTEX_UNLOCK_PI, where we would really like to drop (and unboost) the rt_mutex
> without holding hb->lock.
> 
> While going through the requeue PI code and thinking about how all that worked
> I realized we can avoid the entire problem I've been trying to solve. That is,
> the 'problem' is that futex state and rt_mutex state can end up disagreeing on
> who is waiting for the lock and we muddle around that with intricate state.
> 
> This series, well patch 8, avoids the entire problem by making sure this
> inconsistent state does not occur. Which then simplifies everything -- assuming
> I got it right of course :-)
> 
> The basic idea is to, like requeue PI, break the rt_mutex_lock() function into
> pieces, such that we can enqueue the waiter while holding hb->lock, wait for
> acquisition without hb->lock and can remove the waiter, on failure, while
> holding hb->lock again.

Oh boy, this is going to take some brain space/time. I'll comment as I work
through them and ask questions - to keep the dialog going.

> 
> That way, when we drop hb->lock to wait, futex and rt_mutex wait state is
> consistent.
> 
> 
> In any case, it passes our inadequate testing.

It passed my CI tools/testing/selftests/futex/functional/run.sh. Did you also
happen to run a fuzz tester?

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 01/10] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
  2016-12-13  8:36 ` [PATCH -v4 01/10] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI Peter Zijlstra
@ 2016-12-16 23:58   ` Darren Hart
  0 siblings, 0 replies; 28+ messages in thread
From: Darren Hart @ 2016-12-16 23:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot

On Tue, Dec 13, 2016 at 09:36:39AM +0100, Peter Zijlstra wrote:
> While working on the futex code, I stumbled over this potential
> use-after-free scenario.
> 
> pi_mutex is a pointer into pi_state, which we drop the reference on in
> unqueue_me_pi(). So any access to that pointer after that is bad.
> 
> Since other sites already do rt_mutex_unlock() with hb->lock held, see
> for example futex_lock_pi(), simply move the unlock before
> unqueue_me_pi().

I believe this is unchanged from the previous version I reviewed. My only
comment would be possibly noting in the commit message that in addition to
EFAULT, ENOMEM and EHWPOISON from fixup_owner...fixup_user_fault() will now also
result in the unlock, we saw no reason to handle those cases differently, but in
case someone goes looking for the change, good to have it documented.

This passed my minimal testing (selftests/futex/functional in CI with kvm on a
dual socket Xeon "cpus=20,sockets=2,cores=5,threads=2").

Reviewed-by: Darren Hart <dvhart@linux.intel.com>

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 02/10] futex: Add missing error handling to FUTEX_REQUEUE_PI
  2016-12-13  8:36 ` [PATCH -v4 02/10] futex: Add missing error handling to FUTEX_REQUEUE_PI Peter Zijlstra
@ 2016-12-17  0:06   ` Darren Hart
  2016-12-17 13:54     ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Darren Hart @ 2016-12-17  0:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot

On Tue, Dec 13, 2016 at 09:36:40AM +0100, Peter Zijlstra wrote:
> Thomas spotted that fixup_pi_state_owner() can return errors and we
> fail to unlock the rt_mutex in that case.
> 

We handled this explicitly before Patch 1/10, so can this be rolled into 1/10
(er 9) as a single commit?

> Reported-by: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/futex.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2896,6 +2896,8 @@ static int futex_wait_requeue_pi(u32 __u
>  		if (q.pi_state && (q.pi_state->owner != current)) {
>  			spin_lock(q.lock_ptr);
>  			ret = fixup_pi_state_owner(uaddr2, &q, current);
> +			if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current)
> +				rt_mutex_unlock(&q.pi_state->pi_mutex);
>  			/*
>  			 * Drop the reference to the pi state which
>  			 * the requeue_pi() code acquired for us.
> 
> 
> 

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 03/10] futex: Cleanup variable names for futex_top_waiter()
  2016-12-13  8:36 ` [PATCH -v4 03/10] futex: Cleanup variable names for futex_top_waiter() Peter Zijlstra
@ 2016-12-17  0:13   ` Darren Hart
  2017-02-22 14:07     ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Darren Hart @ 2016-12-17  0:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot

On Tue, Dec 13, 2016 at 09:36:41AM +0100, Peter Zijlstra wrote:
> futex_top_waiter() returns the top-waiter on the pi_mutex. Assinging
> this to a variable 'match' totally obscures the code.
> 

Yes please.

One wording nit...

> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/futex.c |   30 +++++++++++++++---------------
>  1 file changed, 15 insertions(+), 15 deletions(-)
> 
> --- a/kernel/futex.c
> +++ b/kernel/futex.c

> @@ -1317,11 +1317,11 @@ static int wake_futex_pi(u32 __user *uad
>  
>  	/*
>  	 * It is possible that the next waiter (the one that brought
> -	 * this owner to the kernel) timed out and is no longer
> +	 * top_waiter owner to the kernel) timed out and is no longer
>  	 * waiting on the lock.
>  	 */

This breaks my parser (and did before too).

Consider:

/*
 * It is possible that the next waiter (that caused top_waiter to call
 * into the kernel) has since timed out and is no longer waiting on the
 * lock.
 */

Is that clearer?

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 04/10] futex: Use smp_store_release() in mark_wake_futex()
  2016-12-13  8:36 ` [PATCH -v4 04/10] futex: Use smp_store_release() in mark_wake_futex() Peter Zijlstra
@ 2016-12-17  0:50   ` Darren Hart
  2017-02-22 14:03     ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Darren Hart @ 2016-12-17  0:50 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, paulmck

On Tue, Dec 13, 2016 at 09:36:42AM +0100, Peter Zijlstra wrote:
> Since the futex_q can dissapear the instruction after assigning NULL,
> this really should be a RELEASE barrier. That stops loads from hitting
> dead memory too.
> 

+Paul McKenney

Per the introduction of the comment below from:

	f1a11e0 futex: remove the wait queue

I believe the intent was to ensure the plist_del in ... the previous
__unqueue_futex(q) ... from getting ahead of the smp_store_release added here,
which could result in q being destroyed by the waking task before plist_del can
act on it. Is that
right?

The comment below predates the refactoring which hid plist_del under the
__unqueue_futex() making it a bit less clear as to the associated plist_del:

However, since this comment, we have moved the wake-up out of wake_futex through
the use of wake queues (wake_up_q) which now happens after the hb lock is
released (see futex_wake, futex_wake_op, and futex_requeue). Is this race still
a valid concern?


> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  kernel/futex.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1288,8 +1288,7 @@ static void mark_wake_futex(struct wake_
>  	 * memory barrier is required here to prevent the following
>  	 * store to lock_ptr from getting ahead of the plist_del.
>  	 */
> -	smp_wmb();
> -	q->lock_ptr = NULL;
> +	smp_store_release(&q->lock_ptr, NULL);
>  }
>  
>  static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *top_waiter,
> 
> 
> 

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles
  2016-12-16 23:31 ` Darren Hart
@ 2016-12-17 13:52   ` Peter Zijlstra
  2016-12-18 22:39     ` Darren Hart
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-17 13:52 UTC (permalink / raw)
  To: Darren Hart
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot

On Fri, Dec 16, 2016 at 03:31:40PM -0800, Darren Hart wrote:
> On Tue, Dec 13, 2016 at 09:36:38AM +0100, Peter Zijlstra wrote:
> > That way, when we drop hb->lock to wait, futex and rt_mutex wait state is
> > consistent.
> > 
> > 
> > In any case, it passes our inadequate testing.
> 
> It passed my CI tools/testing/selftests/futex/functional/run.sh. Did you also
> happen to run a fuzz tester?

I did not. I'm not sure how good trinity is at poking holes in futexes.
I would love a domain specific fuzzer for futex, but I suspect it would
end up being me writing it :-(

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 02/10] futex: Add missing error handling to FUTEX_REQUEUE_PI
  2016-12-17  0:06   ` Darren Hart
@ 2016-12-17 13:54     ` Peter Zijlstra
  2016-12-18 23:31       ` Darren Hart
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2016-12-17 13:54 UTC (permalink / raw)
  To: Darren Hart
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot

On Fri, Dec 16, 2016 at 04:06:39PM -0800, Darren Hart wrote:
> On Tue, Dec 13, 2016 at 09:36:40AM +0100, Peter Zijlstra wrote:
> > Thomas spotted that fixup_pi_state_owner() can return errors and we
> > fail to unlock the rt_mutex in that case.
> > 
> 
> We handled this explicitly before Patch 1/10, so can this be rolled into 1/10
> (er 9) as a single commit?

I don't think we did, see how this branch doesn't set pi_mutex.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles
  2016-12-17 13:52   ` Peter Zijlstra
@ 2016-12-18 22:39     ` Darren Hart
  0 siblings, 0 replies; 28+ messages in thread
From: Darren Hart @ 2016-12-18 22:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot

On Sat, Dec 17, 2016 at 02:52:14PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 03:31:40PM -0800, Darren Hart wrote:
> > On Tue, Dec 13, 2016 at 09:36:38AM +0100, Peter Zijlstra wrote:
> > > That way, when we drop hb->lock to wait, futex and rt_mutex wait state is
> > > consistent.
> > > 
> > > 
> > > In any case, it passes our inadequate testing.
> > 
> > It passed my CI tools/testing/selftests/futex/functional/run.sh. Did you also
> > happen to run a fuzz tester?
> 
> I did not. I'm not sure how good trinity is at poking holes in futexes.
> I would love a domain specific fuzzer for futex, but I suspect it would
> end up being me writing it :-(
> 

Trinity had some futex awareness and found several issues in the past, I can't
say how likely it is to find more. I haven't tried ... syzcaller? yet.

I had set out to do this in the futextest suite, which I later merged into
kselftests, but I scrapped the fuzz testing, deferring to trinity. Perhaps that
is a project we should resurrect.

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 02/10] futex: Add missing error handling to FUTEX_REQUEUE_PI
  2016-12-17 13:54     ` Peter Zijlstra
@ 2016-12-18 23:31       ` Darren Hart
  0 siblings, 0 replies; 28+ messages in thread
From: Darren Hart @ 2016-12-18 23:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot

On Sat, Dec 17, 2016 at 02:54:11PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 04:06:39PM -0800, Darren Hart wrote:
> > On Tue, Dec 13, 2016 at 09:36:40AM +0100, Peter Zijlstra wrote:
> > > Thomas spotted that fixup_pi_state_owner() can return errors and we
> > > fail to unlock the rt_mutex in that case.
> > > 
> > 
> > We handled this explicitly before Patch 1/10, so can this be rolled into 1/10
> > (er 9) as a single commit?
> 
> I don't think we did, see how this branch doesn't set pi_mutex.
> 

So it is, I stand corrected. So for 2/10 also then:

Reviewed-by: Darren Hart <dvhart@linux.intel.com>

-- 
Darren Hart
Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles
  2016-12-13 16:07 ` [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
@ 2017-02-22 11:02   ` Peter Zijlstra
  2017-02-22 15:36     ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2017-02-22 11:02 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart

On Tue, Dec 13, 2016 at 05:07:14PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 13, 2016 at 09:36:38AM +0100, Peter Zijlstra wrote:
> 
> > The basic idea is to, like requeue PI, break the rt_mutex_lock() function into
> > pieces, such that we can enqueue the waiter while holding hb->lock, wait for
> > acquisition without hb->lock and can remove the waiter, on failure, while
> > holding hb->lock again.
> > 
> > That way, when we drop hb->lock to wait, futex and rt_mutex wait state is
> > consistent.
> 
> And of course, there's a hole in...
> 
> There is a point in futex_unlock_pi() where we hold neither hb->lock nor
> wait_lock, at that point a futex_lock_pi() that had failed its
> rt_mutex_wait_proxy_lock() can sneak in and remove itself, even though
> we saw its waiter, recreating a vraiant of the initial problem.
> 
> The below plugs the hole, but its rather fragile in that it relies on
> overlapping critical sections and the specific detail that we call
> rt_mutex_cleanup_proxy_lock() immediately after (re)acquiring hb->lock.
> 
> There is another solution, but that's more involved and uglier still.
> 
> I'll give it a bit more thought.
> 

OK, so after having not thought about this, and then spend the last two
days trying to cram all this nonsense back into my head, I think I have
a slightly simpler option.

In any case, I'll go respin the patch-set and repost.


--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1395,7 +1395,18 @@ static int wake_futex_pi(u32 __user *uad
 
 	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 	new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
-	BUG_ON(!new_owner);
+	if (!new_owner) {
+		/*
+		 * Since we held neither hb->lock nor wait_lock when coming
+		 * into this function, we could have raced with futex_lock_pi()
+		 * such that it will have removed the waiter that brought us
+		 * here.
+		 *
+		 * In this case, retry the entire operation.
+		 */
+		ret = -EAGAIN;
+		goto out_unlock;
+	}
 
 	/*
 	 * We pass it to the next owner. The WAITERS bit is always kept
@@ -2657,8 +2668,8 @@ static int futex_lock_pi(u32 __user *uad
 	 * rt_mutex waitqueue, such that we can keep the hb and rt_mutex
 	 * wait lists consistent.
 	 */
-	if (ret)
-		rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter);
+	if (ret && !rt_mutex_cleanup_proxy_lock(&q.pi_state->pi_mutex, &rt_waiter))
+		ret = 0;
 
 did_trylock:
 	/*
@@ -3043,8 +3054,9 @@ static int futex_wait_requeue_pi(u32 __u
 		debug_rt_mutex_free_waiter(&rt_waiter);
 
 		spin_lock(q.lock_ptr);
-		if (ret)
-			rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter);
+		if (ret && !rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter))
+			ret = 0;
+
 		/*
 		 * Fixup the pi_state owner and possibly acquire the lock if we
 		 * haven't already.
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1781,16 +1781,29 @@ int rt_mutex_wait_proxy_lock(struct rt_m
  *
  * Clean up the failed lock acquisition as per rt_mutex_wait_proxy_lock().
  *
+ * Returns:
+ *  true  - did the cleanup, we done.
+ *  false - we acquired the lock after rt_mutex_wait_proxy_lock() returned,
+ *          caller should disregards its return value.
+ *
  * Special API call for PI-futex support
  */
-void rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
+bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
 				 struct rt_mutex_waiter *waiter)
 {
-	raw_spin_lock_irq(&lock->wait_lock);
-
-	remove_waiter(lock, waiter);
-	fixup_rt_mutex_waiters(lock);
+	bool cleanup = false;
 
+	raw_spin_lock_irq(&lock->wait_lock);
+	/*
+	 * If we acquired the lock, no cleanup required.
+	 */
+	if (rt_mutex_owner(lock) != current) {
+		remove_waiter(lock, waiter);
+		fixup_rt_mutex_waiters(lock);
+		cleanup = true;
+	}
 	raw_spin_unlock_irq(&lock->wait_lock);
+
+	return cleanup;
 }
 
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -106,11 +106,10 @@ extern void rt_mutex_proxy_unlock(struct
 extern int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
 				     struct rt_mutex_waiter *waiter,
 				     struct task_struct *task);
-
 extern int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
 			       struct hrtimer_sleeper *to,
 			       struct rt_mutex_waiter *waiter);
-extern void rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
+extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
 				 struct rt_mutex_waiter *waiter);
 
 extern int rt_mutex_futex_trylock(struct rt_mutex *l);

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 04/10] futex: Use smp_store_release() in mark_wake_futex()
  2016-12-17  0:50   ` Darren Hart
@ 2017-02-22 14:03     ` Peter Zijlstra
  2017-03-17  0:35       ` Darren Hart
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2017-02-22 14:03 UTC (permalink / raw)
  To: Darren Hart
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, paulmck

On Fri, Dec 16, 2016 at 04:50:45PM -0800, Darren Hart wrote:
> On Tue, Dec 13, 2016 at 09:36:42AM +0100, Peter Zijlstra wrote:
> > Since the futex_q can dissapear the instruction after assigning NULL,
> > this really should be a RELEASE barrier. That stops loads from hitting
> > dead memory too.
> > 
> 
> +Paul McKenney
> 
> Per the introduction of the comment below from:
> 
> 	f1a11e0 futex: remove the wait queue
> 
> I believe the intent was to ensure the plist_del in ... the previous
> __unqueue_futex(q) ... from getting ahead of the smp_store_release added here,
> which could result in q being destroyed by the waking task before plist_del can
> act on it. Is that
> right?
> 
> The comment below predates the refactoring which hid plist_del under the
> __unqueue_futex() making it a bit less clear as to the associated plist_del:
> 
> However, since this comment, we have moved the wake-up out of wake_futex through
> the use of wake queues (wake_up_q) which now happens after the hb lock is
> released (see futex_wake, futex_wake_op, and futex_requeue). Is this race still
> a valid concern?

Yes I think so, since __unqueue_futex() dereferences lock_ptr and does
stores in the memory it points to, those stores must not happen _after_
we NULL lock_ptr itself.

futex_wait(), which calls unqueue_me() could have had a spurious wakeup
and observe our NULL store and 'free' the futex_q.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 03/10] futex: Cleanup variable names for futex_top_waiter()
  2016-12-17  0:13   ` Darren Hart
@ 2017-02-22 14:07     ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2017-02-22 14:07 UTC (permalink / raw)
  To: Darren Hart
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot

On Fri, Dec 16, 2016 at 04:13:10PM -0800, Darren Hart wrote:
> On Tue, Dec 13, 2016 at 09:36:41AM +0100, Peter Zijlstra wrote:
> > futex_top_waiter() returns the top-waiter on the pi_mutex. Assinging
> > this to a variable 'match' totally obscures the code.
> > 
> 
> Yes please.
> 
> One wording nit...
> 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >  kernel/futex.c |   30 +++++++++++++++---------------
> >  1 file changed, 15 insertions(+), 15 deletions(-)
> > 
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> 
> > @@ -1317,11 +1317,11 @@ static int wake_futex_pi(u32 __user *uad
> >  
> >  	/*
> >  	 * It is possible that the next waiter (the one that brought
> > -	 * this owner to the kernel) timed out and is no longer
> > +	 * top_waiter owner to the kernel) timed out and is no longer
> >  	 * waiting on the lock.
> >  	 */
> 
> This breaks my parser (and did before too).
> 
> Consider:
> 
> /*
>  * It is possible that the next waiter (that caused top_waiter to call
>  * into the kernel) has since timed out and is no longer waiting on the
>  * lock.
>  */
> 
> Is that clearer?

No :-)

Fear not, this wee comment here is what the next many patches are about.
This is the 'little' hole that opens up sooo much pain. In any case,
fear not, the comment will die later on.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles
  2017-02-22 11:02   ` Peter Zijlstra
@ 2017-02-22 15:36     ` Peter Zijlstra
  2017-03-01  9:05       ` Thomas Gleixner
  0 siblings, 1 reply; 28+ messages in thread
From: Peter Zijlstra @ 2017-02-22 15:36 UTC (permalink / raw)
  To: tglx
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart

On Wed, Feb 22, 2017 at 12:02:44PM +0100, Peter Zijlstra wrote:
> OK, so after having not thought about this, and then spend the last two
> days trying to cram all this nonsense back into my head, I think I have
> a slightly simpler option.
> 
> In any case, I'll go respin the patch-set and repost.

That is; what is wrong with the below patch against mainline?

---

 kernel/futex.c | 48 +++++++++++++-----------------------------------
 1 file changed, 13 insertions(+), 35 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index c591a2a..fafa25a 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1318,12 +1318,18 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this,
 	new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
 
 	/*
-	 * It is possible that the next waiter (the one that brought
-	 * this owner to the kernel) timed out and is no longer
-	 * waiting on the lock.
+	 * When we interleave with futex_lock_pi() where it does
+	 * rt_mutex_timed_futex_lock(), we might observe @this futex_q waiter,
+	 * but the rt_mutex's wait_list can be empty (either still, or again,
+	 * depending on which side we land).
+	 *
+	 * When this happens, give up our locks and try again, giving the
+	 * futex_lock_pi() instance time to complete and unqueue_me().
 	 */
-	if (!new_owner)
-		new_owner = this->task;
+	if (!new_owner) {
+		raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock);
+		return -EAGAIN;
+	}
 
 	/*
 	 * We pass it to the next owner. The WAITERS bit is always
@@ -2245,43 +2251,15 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
 	}
 
 	/*
-	 * Catch the rare case, where the lock was released when we were on the
-	 * way back before we locked the hash bucket.
-	 */
-	if (q->pi_state->owner == current) {
-		/*
-		 * Try to get the rt_mutex now. This might fail as some other
-		 * task acquired the rt_mutex after we removed ourself from the
-		 * rt_mutex waiters list.
-		 */
-		if (rt_mutex_trylock(&q->pi_state->pi_mutex)) {
-			locked = 1;
-			goto out;
-		}
-
-		/*
-		 * pi_state is incorrect, some other task did a lock steal and
-		 * we returned due to timeout or signal without taking the
-		 * rt_mutex. Too late.
-		 */
-		raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
-		owner = rt_mutex_owner(&q->pi_state->pi_mutex);
-		if (!owner)
-			owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
-		raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
-		ret = fixup_pi_state_owner(uaddr, q, owner);
-		goto out;
-	}
-
-	/*
 	 * Paranoia check. If we did not take the lock, then we should not be
 	 * the owner of the rt_mutex.
 	 */
-	if (rt_mutex_owner(&q->pi_state->pi_mutex) == current)
+	if (rt_mutex_owner(&q->pi_state->pi_mutex) == current) {
 		printk(KERN_ERR "fixup_owner: ret = %d pi-mutex: %p "
 				"pi-state %p\n", ret,
 				q->pi_state->pi_mutex.owner,
 				q->pi_state->owner);
+	}
 
 out:
 	return ret ? ret : locked;

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles
  2017-02-22 15:36     ` Peter Zijlstra
@ 2017-03-01  9:05       ` Thomas Gleixner
  2017-03-03  9:35         ` Peter Zijlstra
  0 siblings, 1 reply; 28+ messages in thread
From: Thomas Gleixner @ 2017-03-01  9:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart

On Wed, 22 Feb 2017, Peter Zijlstra wrote:
> On Wed, Feb 22, 2017 at 12:02:44PM +0100, Peter Zijlstra wrote:
> > OK, so after having not thought about this, and then spend the last two
> > days trying to cram all this nonsense back into my head, I think I have
> > a slightly simpler option.
> > 
> > In any case, I'll go respin the patch-set and repost.
> 
> That is; what is wrong with the below patch against mainline?

After staring at it for a couple of hours and paging the futex horror back
in, I don't see any problem with that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles
  2017-03-01  9:05       ` Thomas Gleixner
@ 2017-03-03  9:35         ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2017-03-03  9:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, dvhart

On Wed, Mar 01, 2017 at 10:05:22AM +0100, Thomas Gleixner wrote:
> On Wed, 22 Feb 2017, Peter Zijlstra wrote:
> > On Wed, Feb 22, 2017 at 12:02:44PM +0100, Peter Zijlstra wrote:
> > > OK, so after having not thought about this, and then spend the last two
> > > days trying to cram all this nonsense back into my head, I think I have
> > > a slightly simpler option.
> > > 
> > > In any case, I'll go respin the patch-set and repost.
> > 
> > That is; what is wrong with the below patch against mainline?
> 
> After staring at it for a couple of hours and paging the futex horror back
> in, I don't see any problem with that.

Awesome, that makes things simpler. Let me go respin the patches.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH -v4 04/10] futex: Use smp_store_release() in mark_wake_futex()
  2017-02-22 14:03     ` Peter Zijlstra
@ 2017-03-17  0:35       ` Darren Hart
  0 siblings, 0 replies; 28+ messages in thread
From: Darren Hart @ 2017-03-17  0:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, juri.lelli, rostedt, xlpang, bigeasy, linux-kernel,
	mathieu.desnoyers, jdesfossez, bristot, paulmck

On Wed, Feb 22, 2017 at 03:03:16PM +0100, Peter Zijlstra wrote:
> On Fri, Dec 16, 2016 at 04:50:45PM -0800, Darren Hart wrote:
> > On Tue, Dec 13, 2016 at 09:36:42AM +0100, Peter Zijlstra wrote:
> > > Since the futex_q can dissapear the instruction after assigning NULL,
> > > this really should be a RELEASE barrier. That stops loads from hitting
> > > dead memory too.
> > > 
> > 
> > +Paul McKenney
> > 
> > Per the introduction of the comment below from:
> > 
> > 	f1a11e0 futex: remove the wait queue
> > 
> > I believe the intent was to ensure the plist_del in ... the previous
> > __unqueue_futex(q) ... from getting ahead of the smp_store_release added here,
> > which could result in q being destroyed by the waking task before plist_del can
> > act on it. Is that
> > right?
> > 
> > The comment below predates the refactoring which hid plist_del under the
> > __unqueue_futex() making it a bit less clear as to the associated plist_del:
> > 
> > However, since this comment, we have moved the wake-up out of wake_futex through
> > the use of wake queues (wake_up_q) which now happens after the hb lock is
> > released (see futex_wake, futex_wake_op, and futex_requeue). Is this race still
> > a valid concern?
> 
> Yes I think so, since __unqueue_futex() dereferences lock_ptr and does
> stores in the memory it points to, those stores must not happen _after_
> we NULL lock_ptr itself.

Are you referring to the q->lock_ptr = NULL in mark_wake_futex()? 
So the concern is parallel mark_wake_futex() calls on the same futex? But that
can't happen because the call is wrapped by the hb locks. In what scenario can
this occur?

> futex_wait(), which calls unqueue_me() could have had a spurious wakeup
> and observe our NULL store and 'free' the futex_q.

Urg. Spurious wakeups... yes... OK, still necessary. Gah. :-(

-- 
Darren Hart
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2017-03-17  0:35 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-13  8:36 [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
2016-12-13  8:36 ` [PATCH -v4 01/10] futex: Fix potential use-after-free in FUTEX_REQUEUE_PI Peter Zijlstra
2016-12-16 23:58   ` Darren Hart
2016-12-13  8:36 ` [PATCH -v4 02/10] futex: Add missing error handling to FUTEX_REQUEUE_PI Peter Zijlstra
2016-12-17  0:06   ` Darren Hart
2016-12-17 13:54     ` Peter Zijlstra
2016-12-18 23:31       ` Darren Hart
2016-12-13  8:36 ` [PATCH -v4 03/10] futex: Cleanup variable names for futex_top_waiter() Peter Zijlstra
2016-12-17  0:13   ` Darren Hart
2017-02-22 14:07     ` Peter Zijlstra
2016-12-13  8:36 ` [PATCH -v4 04/10] futex: Use smp_store_release() in mark_wake_futex() Peter Zijlstra
2016-12-17  0:50   ` Darren Hart
2017-02-22 14:03     ` Peter Zijlstra
2017-03-17  0:35       ` Darren Hart
2016-12-13  8:36 ` [PATCH -v4 05/10] futex: Remove rt_mutex_deadlock_account_*() Peter Zijlstra
2016-12-13  8:36 ` [PATCH -v4 06/10] futex,rt_mutex: Provide futex specific rt_mutex API Peter Zijlstra
2016-12-13  8:36 ` [PATCH -v4 07/10] futex: Change locking Peter Zijlstra
2016-12-13  8:36 ` [PATCH -v4 08/10] futex: Rework futex_lock_pi() vs rt_mutex_timed_futex_lock() Peter Zijlstra
2016-12-13  8:36 ` [PATCH -v4 09/10] futex: Remove inconsistent hb/rt_mutex state magic Peter Zijlstra
2016-12-13  8:36 ` [PATCH -v4 10/10] futex: Pull rt_mutex_futex_unlock() out from under hb->lock Peter Zijlstra
2016-12-13 16:07 ` [PATCH -v4 00/10] FUTEX_UNLOCK_PI wobbles Peter Zijlstra
2017-02-22 11:02   ` Peter Zijlstra
2017-02-22 15:36     ` Peter Zijlstra
2017-03-01  9:05       ` Thomas Gleixner
2017-03-03  9:35         ` Peter Zijlstra
2016-12-16 23:31 ` Darren Hart
2016-12-17 13:52   ` Peter Zijlstra
2016-12-18 22:39     ` Darren Hart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).