All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/5] futex: More robustness tweaks
@ 2014-06-11 20:45 Thomas Gleixner
  2014-06-11 20:45 ` [patch 1/5] futex: Make unlock_pi more robust Thomas Gleixner
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-11 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Kees Cook, wad

While looking for the minimal functional fix for the futex CVE, I
found a few things which can be done simpler and therefor make the
code more robust.

1) UNLOCK_PI

   Change the ordering:
   
   - Lookup waiters first. If waiters exist wake up the top priority
     waiter with all sanity checks applied. That allows us to catch
     manipulation of the user space value.

   - Only if there are no waiters, do the atomic release

2) futex_lock_pi_atomic()

   Its a maze of retry hoops and loops. Reduce it to simple and
   userstandable states.

   That requires to split out the lookup and validation functions from
   lookup_pi_state(), but that turns out to be an overall win on
   readabilty.

The overall cleanup results in less code and 488 bytes text size
reduction on x8664.

Thanks,

	tglx
---
 futex.c |  385 ++++++++++++++++++++++++++++------------------------------------
 1 file changed, 171 insertions(+), 214 deletions(-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 1/5] futex: Make unlock_pi more robust
  2014-06-11 20:45 [patch 0/5] futex: More robustness tweaks Thomas Gleixner
@ 2014-06-11 20:45 ` Thomas Gleixner
  2014-06-16 16:18   ` Darren Hart
  2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
  2014-06-11 20:45 ` [patch 3/5] futex: Split out the waiter check from lookup_pi_state() Thomas Gleixner
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-11 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Kees Cook, wad

[-- Attachment #1: futex-make-unlock-pi-more-robust.patch --]
[-- Type: text/plain, Size: 4267 bytes --]

The kernel tries to atomically unlock the futex without checking
whether there is kernel state associated to the futex.

So if user space manipulated the user space value, this will leave
kernel internal state around associated to the owner task. 

For robustness sake, lookup first whether there are waiters on the
futex. If there are waiters, wake the top priority waiter with all the
proper sanity checks applied.

If there are no waiters, do the atomic release. We do not have to
preserve the waiters bit in this case, because a potentially incoming
waiter is blocked on the hb->lock and will acquire the futex
atomically. We neither have to preserve the owner died bit. The caller
is the owner and it was supposed to cleanup the mess.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c |   78 +++++++++++++++++++--------------------------------------
 1 file changed, 26 insertions(+), 52 deletions(-)

Index: linux/kernel/futex.c
===================================================================
--- linux.orig/kernel/futex.c
+++ linux/kernel/futex.c
@@ -1186,22 +1186,6 @@ static int wake_futex_pi(u32 __user *uad
 	return 0;
 }
 
-static int unlock_futex_pi(u32 __user *uaddr, u32 uval)
-{
-	u32 uninitialized_var(oldval);
-
-	/*
-	 * There is no waiter, so we unlock the futex. The owner died
-	 * bit has not to be preserved here. We are the owner:
-	 */
-	if (cmpxchg_futex_value_locked(&oldval, uaddr, uval, 0))
-		return -EFAULT;
-	if (oldval != uval)
-		return -EAGAIN;
-
-	return 0;
-}
-
 /*
  * Express the locking dependencies for lockdep:
  */
@@ -2401,10 +2385,10 @@ uaddr_faulted:
  */
 static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 {
-	struct futex_hash_bucket *hb;
-	struct futex_q *this, *next;
+	u32 uninitialized_var(curval), uval, vpid = task_pid_vnr(current);
 	union futex_key key = FUTEX_KEY_INIT;
-	u32 uval, vpid = task_pid_vnr(current);
+	struct futex_hash_bucket *hb;
+	struct futex_q *match;
 	int ret;
 
 retry:
@@ -2417,57 +2401,47 @@ retry:
 		return -EPERM;
 
 	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, VERIFY_WRITE);
-	if (unlikely(ret != 0))
-		goto out;
+	if (ret)
+		return ret;
 
 	hb = hash_futex(&key);
 	spin_lock(&hb->lock);
 
 	/*
-	 * To avoid races, try to do the TID -> 0 atomic transition
-	 * again. If it succeeds then we can return without waking
-	 * anyone else up. We only try this if neither the waiters nor
-	 * the owner died bit are set.
-	 */
-	if (!(uval & ~FUTEX_TID_MASK) &&
-	    cmpxchg_futex_value_locked(&uval, uaddr, vpid, 0))
-		goto pi_faulted;
-	/*
-	 * Rare case: we managed to release the lock atomically,
-	 * no need to wake anyone else up:
-	 */
-	if (unlikely(uval == vpid))
-		goto out_unlock;
-
-	/*
-	 * Ok, other tasks may need to be woken up - check waiters
-	 * and do the wakeup if necessary:
-	 */
-	plist_for_each_entry_safe(this, next, &hb->chain, list) {
-		if (!match_futex (&this->key, &key))
-			continue;
-		ret = wake_futex_pi(uaddr, uval, this);
+	 * Check waiters first. We do not trust user space values at
+	 * all and we at least want to know if user space fiddled
+	 * with the futex value instead of blindly unlocking.
+	 */
+	match = futex_top_waiter(hb, &key);
+	if (match) {
+		ret = wake_futex_pi(uaddr, uval, match);
 		/*
-		 * The atomic access to the futex value
-		 * generated a pagefault, so retry the
-		 * user-access and the wakeup:
+		 * The atomic access to the futex value generated a
+		 * pagefault, so retry the user-access and the wakeup:
 		 */
 		if (ret == -EFAULT)
 			goto pi_faulted;
 		goto out_unlock;
 	}
+
 	/*
-	 * No waiters - kernel unlocks the futex:
+	 * We have no kernel internal state, i.e. no waiters in the
+	 * kernel. Waiters which are about to queue themself are stuck
+	 * on hb->lock. So we can safely ignore them. We do neither
+	 * preserve the WAITERS bit not the OWNER_DIED one. We are the
+	 * owner.
 	 */
-	ret = unlock_futex_pi(uaddr, uval);
-	if (ret == -EFAULT)
+	if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))
 		goto pi_faulted;
 
+	/*
+	 * If uval has changed, let user space handle it.
+	 */
+	ret = (curval == uval) ? 0 : -EAGAIN;
+
 out_unlock:
 	spin_unlock(&hb->lock);
 	put_futex_key(&key);
-
-out:
 	return ret;
 
 pi_faulted:



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 3/5] futex: Split out the waiter check from lookup_pi_state()
  2014-06-11 20:45 [patch 0/5] futex: More robustness tweaks Thomas Gleixner
  2014-06-11 20:45 ` [patch 1/5] futex: Make unlock_pi more robust Thomas Gleixner
@ 2014-06-11 20:45 ` Thomas Gleixner
  2014-06-16 18:12   ` Darren Hart
  2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
  2014-06-11 20:45 ` [patch 2/5] futex: Use futex_top_waiter() in lookup_pi_state() Thomas Gleixner
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-11 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Kees Cook, wad

[-- Attachment #1: futex-split-waiter-check-out.patch --]
[-- Type: text/plain, Size: 4911 bytes --]

We want to be a bit more clever in futex_lock_pi_atomic() and separate
the possible states. Split out the waiter verification into a separate
function. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c |  138 +++++++++++++++++++++++++++++----------------------------
 1 file changed, 71 insertions(+), 67 deletions(-)

Index: linux/kernel/futex.c
===================================================================
--- linux.orig/kernel/futex.c
+++ linux/kernel/futex.c
@@ -792,92 +792,96 @@ void exit_pi_state_list(struct task_stru
  * [10] There is no transient state which leaves owner and user space
  *	TID out of sync.
  */
-static int
-lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
-		union futex_key *key, struct futex_pi_state **ps)
+
+/*
+ * Validate that the existing waiter has a pi_state and sanity check
+ * the pi_state against the user space value. If correct, attach to
+ * it.
+ */
+static int attach_to_pi_state(u32 uval, struct futex_pi_state *pi_state,
+			      struct futex_pi_state **ps)
 {
-	struct futex_q *match = futex_top_waiter(hb, key);
-	struct futex_pi_state *pi_state = NULL;
-	struct task_struct *p;
 	pid_t pid = uval & FUTEX_TID_MASK;
 
-	if (match) {
-		/*
-		 * Sanity check the waiter before increasing the
-		 * refcount and attaching to it.
-		 */
-		pi_state = match->pi_state;
-		/*
-		 * Userspace might have messed up non-PI and PI
-		 * futexes [3]
-		 */
-		if (unlikely(!pi_state))
-			return -EINVAL;
+	/*
+	 * Userspace might have messed up non-PI and PI futexes [3]
+	 */
+	if (unlikely(!pi_state))
+		return -EINVAL;
 
-		WARN_ON(!atomic_read(&pi_state->refcount));
+	WARN_ON(!atomic_read(&pi_state->refcount));
 
+	/*
+	 * Handle the owner died case:
+	 */
+	if (uval & FUTEX_OWNER_DIED) {
 		/*
-		 * Handle the owner died case:
+		 * exit_pi_state_list sets owner to NULL and wakes the
+		 * topmost waiter. The task which acquires the
+		 * pi_state->rt_mutex will fixup owner.
 		 */
-		if (uval & FUTEX_OWNER_DIED) {
-			/*
-			 * exit_pi_state_list sets owner to NULL and
-			 * wakes the topmost waiter. The task which
-			 * acquires the pi_state->rt_mutex will fixup
-			 * owner.
-			 */
-			if (!pi_state->owner) {
-				/*
-				 * No pi state owner, but the user
-				 * space TID is not 0. Inconsistent
-				 * state. [5]
-				 */
-				if (pid)
-					return -EINVAL;
-				/*
-				 * Take a ref on the state and
-				 * return. [4]
-				 */
-				goto out_state;
-			}
-
+		if (!pi_state->owner) {
 			/*
-			 * If TID is 0, then either the dying owner
-			 * has not yet executed exit_pi_state_list()
-			 * or some waiter acquired the rtmutex in the
-			 * pi state, but did not yet fixup the TID in
-			 * user space.
-			 *
-			 * Take a ref on the state and return. [6]
+			 * No pi state owner, but the user space TID
+			 * is not 0. Inconsistent state. [5]
 			 */
-			if (!pid)
-				goto out_state;
-		} else {
+			if (pid)
+				return -EINVAL;
 			/*
-			 * If the owner died bit is not set,
-			 * then the pi_state must have an
-			 * owner. [7]
+			 * Take a ref on the state and return success. [4]
 			 */
-			if (!pi_state->owner)
-				return -EINVAL;
+			goto out_state;
 		}
 
 		/*
-		 * Bail out if user space manipulated the
-		 * futex value. If pi state exists then the
-		 * owner TID must be the same as the user
-		 * space TID. [9/10]
+		 * If TID is 0, then either the dying owner has not
+		 * yet executed exit_pi_state_list() or some waiter
+		 * acquired the rtmutex in the pi state, but did not
+		 * yet fixup the TID in user space.
+		 *
+		 * Take a ref on the state and return success. [6]
 		 */
-		if (pid != task_pid_vnr(pi_state->owner))
+		if (!pid)
+			goto out_state;
+	} else {
+		/*
+		 * If the owner died bit is not set, then the pi_state
+		 * must have an owner. [7]
+		 */
+		if (!pi_state->owner)
 			return -EINVAL;
-
-	out_state:
-		atomic_inc(&pi_state->refcount);
-		*ps = pi_state;
-		return 0;
 	}
 
 	/*
+	 * Bail out if user space manipulated the futex value. If pi
+	 * state exists then the owner TID must be the same as the
+	 * user space TID. [9/10]
+	 */
+	if (pid != task_pid_vnr(pi_state->owner))
+		return -EINVAL;
+out_state:
+	atomic_inc(&pi_state->refcount);
+	*ps = pi_state;
+	return 0;
+}
+
+static int
+lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
+		union futex_key *key, struct futex_pi_state **ps)
+{
+	struct futex_q *match = futex_top_waiter(hb, key);
+	struct futex_pi_state *pi_state = NULL;
+	struct task_struct *p;
+	pid_t pid = uval & FUTEX_TID_MASK;
+
+	/*
+	 * If there is a waiter on that futex, validate it and
+	 * attach to the pi_state when the validation succeeds.
+	 */
+	if (match)
+		return attach_to_pi_state(uval, match->pi_state, ps);
+
+	/*
 	 * We are the first waiter - try to look up the real owner and attach
 	 * the new pi_state to it, but bail out when TID = 0 [1]
 	 */



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 2/5] futex: Use futex_top_waiter() in lookup_pi_state()
  2014-06-11 20:45 [patch 0/5] futex: More robustness tweaks Thomas Gleixner
  2014-06-11 20:45 ` [patch 1/5] futex: Make unlock_pi more robust Thomas Gleixner
  2014-06-11 20:45 ` [patch 3/5] futex: Split out the waiter check from lookup_pi_state() Thomas Gleixner
@ 2014-06-11 20:45 ` Thomas Gleixner
  2014-06-16 16:51   ` Darren Hart
  2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
  2014-06-11 20:45 ` [patch 4/5] futex: Split out the first waiter attachment from lookup_pi_state() Thomas Gleixner
  2014-06-11 20:45 ` [patch 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust Thomas Gleixner
  4 siblings, 2 replies; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-11 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Kees Cook, wad

[-- Attachment #1: futex-use-futex-top-waiter-for-lookup.patch --]
[-- Type: text/plain, Size: 4173 bytes --]

No point in open coding the same function again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c |  128 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 63 insertions(+), 65 deletions(-)

Index: linux/kernel/futex.c
===================================================================
--- linux.orig/kernel/futex.c
+++ linux/kernel/futex.c
@@ -796,87 +796,85 @@ static int
 lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 		union futex_key *key, struct futex_pi_state **ps)
 {
+	struct futex_q *match = futex_top_waiter(hb, key);
 	struct futex_pi_state *pi_state = NULL;
-	struct futex_q *this, *next;
 	struct task_struct *p;
 	pid_t pid = uval & FUTEX_TID_MASK;
 
-	plist_for_each_entry_safe(this, next, &hb->chain, list) {
-		if (match_futex(&this->key, key)) {
+	if (match) {
+		/*
+		 * Sanity check the waiter before increasing the
+		 * refcount and attaching to it.
+		 */
+		pi_state = match->pi_state;
+		/*
+		 * Userspace might have messed up non-PI and PI
+		 * futexes [3]
+		 */
+		if (unlikely(!pi_state))
+			return -EINVAL;
+
+		WARN_ON(!atomic_read(&pi_state->refcount));
+
+		/*
+		 * Handle the owner died case:
+		 */
+		if (uval & FUTEX_OWNER_DIED) {
 			/*
-			 * Sanity check the waiter before increasing
-			 * the refcount and attaching to it.
+			 * exit_pi_state_list sets owner to NULL and
+			 * wakes the topmost waiter. The task which
+			 * acquires the pi_state->rt_mutex will fixup
+			 * owner.
 			 */
-			pi_state = this->pi_state;
-			/*
-			 * Userspace might have messed up non-PI and
-			 * PI futexes [3]
-			 */
-			if (unlikely(!pi_state))
-				return -EINVAL;
-
-			WARN_ON(!atomic_read(&pi_state->refcount));
-
-			/*
-			 * Handle the owner died case:
-			 */
-			if (uval & FUTEX_OWNER_DIED) {
-				/*
-				 * exit_pi_state_list sets owner to NULL and
-				 * wakes the topmost waiter. The task which
-				 * acquires the pi_state->rt_mutex will fixup
-				 * owner.
-				 */
-				if (!pi_state->owner) {
-					/*
-					 * No pi state owner, but the user
-					 * space TID is not 0. Inconsistent
-					 * state. [5]
-					 */
-					if (pid)
-						return -EINVAL;
-					/*
-					 * Take a ref on the state and
-					 * return. [4]
-					 */
-					goto out_state;
-				}
-
+			if (!pi_state->owner) {
 				/*
-				 * If TID is 0, then either the dying owner
-				 * has not yet executed exit_pi_state_list()
-				 * or some waiter acquired the rtmutex in the
-				 * pi state, but did not yet fixup the TID in
-				 * user space.
-				 *
-				 * Take a ref on the state and return. [6]
+				 * No pi state owner, but the user
+				 * space TID is not 0. Inconsistent
+				 * state. [5]
 				 */
-				if (!pid)
-					goto out_state;
-			} else {
+				if (pid)
+					return -EINVAL;
 				/*
-				 * If the owner died bit is not set,
-				 * then the pi_state must have an
-				 * owner. [7]
+				 * Take a ref on the state and
+				 * return. [4]
 				 */
-				if (!pi_state->owner)
-					return -EINVAL;
+				goto out_state;
 			}
 
 			/*
-			 * Bail out if user space manipulated the
-			 * futex value. If pi state exists then the
-			 * owner TID must be the same as the user
-			 * space TID. [9/10]
+			 * If TID is 0, then either the dying owner
+			 * has not yet executed exit_pi_state_list()
+			 * or some waiter acquired the rtmutex in the
+			 * pi state, but did not yet fixup the TID in
+			 * user space.
+			 *
+			 * Take a ref on the state and return. [6]
 			 */
-			if (pid != task_pid_vnr(pi_state->owner))
+			if (!pid)
+				goto out_state;
+		} else {
+			/*
+			 * If the owner died bit is not set,
+			 * then the pi_state must have an
+			 * owner. [7]
+			 */
+			if (!pi_state->owner)
 				return -EINVAL;
-
-		out_state:
-			atomic_inc(&pi_state->refcount);
-			*ps = pi_state;
-			return 0;
 		}
+
+		/*
+		 * Bail out if user space manipulated the
+		 * futex value. If pi state exists then the
+		 * owner TID must be the same as the user
+		 * space TID. [9/10]
+		 */
+		if (pid != task_pid_vnr(pi_state->owner))
+			return -EINVAL;
+
+	out_state:
+		atomic_inc(&pi_state->refcount);
+		*ps = pi_state;
+		return 0;
 	}
 
 	/*



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 4/5] futex: Split out the first waiter attachment from lookup_pi_state()
  2014-06-11 20:45 [patch 0/5] futex: More robustness tweaks Thomas Gleixner
                   ` (2 preceding siblings ...)
  2014-06-11 20:45 ` [patch 2/5] futex: Use futex_top_waiter() in lookup_pi_state() Thomas Gleixner
@ 2014-06-11 20:45 ` Thomas Gleixner
  2014-06-16 18:19   ` Darren Hart
  2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
  2014-06-11 20:45 ` [patch 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust Thomas Gleixner
  4 siblings, 2 replies; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-11 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Kees Cook, wad

[-- Attachment #1: futex-split-out-attach-to-owner.patch --]
[-- Type: text/plain, Size: 2455 bytes --]

We want to be a bit more clever in futex_lock_pi_atomic() and separate
the possible states. Split out the code which attaches the first
waiter to the owner into a separate function. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c |   42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

Index: linux/kernel/futex.c
===================================================================
--- linux.orig/kernel/futex.c
+++ linux/kernel/futex.c
@@ -865,21 +865,16 @@ out_state:
 	return 0;
 }
 
-static int
-lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
-		union futex_key *key, struct futex_pi_state **ps)
+/*
+ * Lookup the task for the TID provided from user space and attach to
+ * it after doing proper sanity checks.
+ */
+static int attach_to_pi_owner(u32 uval, union futex_key *key,
+			      struct futex_pi_state **ps)
 {
-	struct futex_q *match = futex_top_waiter(hb, key);
-	struct futex_pi_state *pi_state = NULL;
-	struct task_struct *p;
 	pid_t pid = uval & FUTEX_TID_MASK;
-
-	/*
-	 * If there is a waiter on that futex, validate it and
-	 * attach to the pi_state when the validation succeeds.
-	 */
-	if (match)
-		return attach_to_pi_state(uval, match->pi_state, ps);
+	struct futex_pi_state *pi_state;
+	struct task_struct *p;
 
 	/*
 	 * We are the first waiter - try to look up the real owner and attach
@@ -922,7 +917,7 @@ lookup_pi_state(u32 uval, struct futex_h
 	pi_state = alloc_pi_state();
 
 	/*
-	 * Initialize the pi_mutex in locked state and make 'p'
+	 * Initialize the pi_mutex in locked state and make @p
 	 * the owner of it:
 	 */
 	rt_mutex_init_proxy_locked(&pi_state->pi_mutex, p);
@@ -942,6 +937,25 @@ lookup_pi_state(u32 uval, struct futex_h
 	return 0;
 }
 
+static int lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
+			   union futex_key *key, struct futex_pi_state **ps)
+{
+	struct futex_q *match = futex_top_waiter(hb, key);
+
+	/*
+	 * If there is a waiter on that futex, validate it and
+	 * attach to the pi_state when the validation succeeds.
+	 */
+	if (match)
+		return attach_to_pi_state(uval, match->pi_state, ps);
+
+	/*
+	 * We are the first waiter - try to look up the owner based on
+	 * @uval and attach to it.
+	 */
+	return attach_to_pi_owner(uval, key, ps);
+}
+
 /**
  * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
  * @uaddr:		the pi futex user address



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust
  2014-06-11 20:45 [patch 0/5] futex: More robustness tweaks Thomas Gleixner
                   ` (3 preceding siblings ...)
  2014-06-11 20:45 ` [patch 4/5] futex: Split out the first waiter attachment from lookup_pi_state() Thomas Gleixner
@ 2014-06-11 20:45 ` Thomas Gleixner
  2014-06-13  5:46   ` Darren Hart
  4 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-11 20:45 UTC (permalink / raw)
  To: LKML
  Cc: Peter Zijlstra, Darren Hart, Ingo Molnar, Davidlohr Bueso,
	Kees Cook, wad

[-- Attachment #1: futex-make-futex-lock-atomic-p-more-robust.patch --]
[-- Type: text/plain, Size: 5987 bytes --]

futex_lock_pi_atomic() is a maze of retry hoops and loops.

Reduce it to simple and understandable states:

First step is to lookup existing waiters (state) in the kernel.

If there is an existing waiter, validate it and attach to it.

If there is no existing waiter, check the user space value

If the TID encoded in the user space value is 0, take over the futex
preserving the owner died bit.

If the TID encoded in the user space value is != 0, lookup the owner
task, validate it and attach to it.

Reduces text size by 144 bytes on x8664.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c |  139 +++++++++++++++++++++------------------------------------
 1 file changed, 53 insertions(+), 86 deletions(-)

Index: linux/kernel/futex.c
===================================================================
--- linux.orig/kernel/futex.c
+++ linux/kernel/futex.c
@@ -956,6 +956,17 @@ static int lookup_pi_state(u32 uval, str
 	return attach_to_pi_owner(uval, key, ps);
 }
 
+static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
+{
+	u32 uninitialized_var(curval);
+
+	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
+		return -EFAULT;
+
+	/* If user space value changed, let the caller retry */
+	return curval != uval ? -EAGAIN : 0;
+}
+
 /**
  * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
  * @uaddr:		the pi futex user address
@@ -979,113 +990,67 @@ static int futex_lock_pi_atomic(u32 __us
 				struct futex_pi_state **ps,
 				struct task_struct *task, int set_waiters)
 {
-	int lock_taken, ret, force_take = 0;
-	u32 uval, newval, curval, vpid = task_pid_vnr(task);
-
-retry:
-	ret = lock_taken = 0;
+	u32 uval, newval, vpid = task_pid_vnr(task);
+	struct futex_q *match;
+	int ret;
 
 	/*
-	 * To avoid races, we attempt to take the lock here again
-	 * (by doing a 0 -> TID atomic cmpxchg), while holding all
-	 * the locks. It will most likely not succeed.
+	 * Read the user space value first so we can validate a few
+	 * things before proceeding further.
 	 */
-	newval = vpid;
-	if (set_waiters)
-		newval |= FUTEX_WAITERS;
-
-	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, 0, newval)))
+	if (get_futex_value_locked(&uval, uaddr))
 		return -EFAULT;
 
 	/*
 	 * Detect deadlocks.
 	 */
-	if ((unlikely((curval & FUTEX_TID_MASK) == vpid)))
+	if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))
 		return -EDEADLK;
 
 	/*
-	 * Surprise - we got the lock, but we do not trust user space at all.
+	 * Lookup existing state first. If it exists, try to attach to
+	 * its pi_state.
 	 */
-	if (unlikely(!curval)) {
-		/*
-		 * We verify whether there is kernel state for this
-		 * futex. If not, we can safely assume, that the 0 ->
-		 * TID transition is correct. If state exists, we do
-		 * not bother to fixup the user space state as it was
-		 * corrupted already.
-		 */
-		return futex_top_waiter(hb, key) ? -EINVAL : 1;
-	}
-
-	uval = curval;
+	match = futex_top_waiter(hb, key);
+	if (match)
+		return attach_to_pi_state(uval, match->pi_state, ps);
 
 	/*
-	 * Set the FUTEX_WAITERS flag, so the owner will know it has someone
-	 * to wake at the next unlock.
+	 * No waiter and user TID is 0. We are here because the
+	 * waiters or the owner died bit is set or called from
+	 * requeue_cmp_pi or for whatever reason something took the
+	 * syscall.
 	 */
-	newval = curval | FUTEX_WAITERS;
-
-	/*
-	 * Should we force take the futex? See below.
-	 */
-	if (unlikely(force_take)) {
+	if (!(uval & FUTEX_TID_MASK)) {
 		/*
-		 * Keep the OWNER_DIED and the WAITERS bit and set the
-		 * new TID value.
+		 * We take over the futex. No other waiters and the user space
+		 * TID is 0. We preserve the owner died bit.
 		 */
-		newval = (curval & ~FUTEX_TID_MASK) | vpid;
-		force_take = 0;
-		lock_taken = 1;
-	}
+		newval = uval & FUTEX_OWNER_DIED;
+		newval |= vpid;
 
-	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
-		return -EFAULT;
-	if (unlikely(curval != uval))
-		goto retry;
+		/* The futex requeue_pi code can enforce the waiters bit */
+		if (set_waiters)
+			newval |= FUTEX_WAITERS;
+
+		return lock_pi_update_atomic(uaddr, uval, newval);
+	}
 
 	/*
-	 * We took the lock due to forced take over.
+	 * First waiter. Set the waiters bit before attaching ourself to
+	 * the owner. If owner tries to unlock, it will be forced into
+	 * the kernel and blocked on hb->lock.
 	 */
-	if (unlikely(lock_taken))
-		return 1;
-
+	newval = uval | FUTEX_WAITERS;
+	ret = lock_pi_update_atomic(uaddr, uval, newval);
+	if (ret)
+		return ret;
 	/*
-	 * We dont have the lock. Look up the PI state (or create it if
-	 * we are the first waiter):
+	 * If the update of the user space value succeeded, we try to
+	 * attach to the owner. If that fails, no harm done, we only
+	 * set the FUTEX_WAITERS bit in the user space variable.
 	 */
-	ret = lookup_pi_state(uval, hb, key, ps);
-
-	if (unlikely(ret)) {
-		switch (ret) {
-		case -ESRCH:
-			/*
-			 * We failed to find an owner for this
-			 * futex. So we have no pi_state to block
-			 * on. This can happen in two cases:
-			 *
-			 * 1) The owner died
-			 * 2) A stale FUTEX_WAITERS bit
-			 *
-			 * Re-read the futex value.
-			 */
-			if (get_futex_value_locked(&curval, uaddr))
-				return -EFAULT;
-
-			/*
-			 * If the owner died or we have a stale
-			 * WAITERS bit the owner TID in the user space
-			 * futex is 0.
-			 */
-			if (!(curval & FUTEX_TID_MASK)) {
-				force_take = 1;
-				goto retry;
-			}
-		default:
-			break;
-		}
-	}
-
-	return ret;
+	return attach_to_pi_owner(uval, key, ps);
 }
 
 /**
@@ -2316,8 +2281,10 @@ retry_private:
 			goto uaddr_faulted;
 		case -EAGAIN:
 			/*
-			 * Task is exiting and we just wait for the
-			 * exit to complete.
+			 * Two reasons for this:
+			 * - Task is exiting and we just wait for the
+			 *   exit to complete.
+			 * - The user space value changed.
 			 */
 			queue_unlock(hb);
 			put_futex_key(&q.key);



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust
  2014-06-11 20:45 ` [patch 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust Thomas Gleixner
@ 2014-06-13  5:46   ` Darren Hart
  2014-06-13  8:34     ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Darren Hart @ 2014-06-13  5:46 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> futex_lock_pi_atomic() is a maze of retry hoops and loops.
> 
> Reduce it to simple and understandable states:

Heh... well...

With this patch applied (1-4 will not reproduce without 5), if userspace
wrongly sets the uval to 0, the pi_state can end up being NULL for a
subsequent requeue_pi operation:

[   10.426159] requeue: 00000000006022e0 to 00000000006022e4
[   10.427737]   this:ffff88013a637da8
[   10.428749]   waking:ffff88013a637da8
fut2 = 0
[   10.429994]   comparing requeue_pi_key
[   10.431034]   prepare waiter to take the rt_mutex
[   10.432344]   pi_state:          (null)
[   10.433414] BUG: unable to handle kernel NULL pointer dereference at
0000000000000038

This occurs in the requeue loop, in the requeue_pi block at:

	atomic_inc(&pi_state->refcount);


> 
> First step is to lookup existing waiters (state) in the kernel.
> 
> If there is an existing waiter, validate it and attach to it.
> 
> If there is no existing waiter, check the user space value
> 
> If the TID encoded in the user space value is 0, take over the futex
> preserving the owner died bit.

This is the scenario resulting in the panic. See below.

> 
> If the TID encoded in the user space value is != 0, lookup the owner
> task, validate it and attach to it.
> 
> Reduces text size by 144 bytes on x8664.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/futex.c |  139 +++++++++++++++++++++------------------------------------
>  1 file changed, 53 insertions(+), 86 deletions(-)
> 
> Index: linux/kernel/futex.c
> ===================================================================
> --- linux.orig/kernel/futex.c
> +++ linux/kernel/futex.c
> @@ -956,6 +956,17 @@ static int lookup_pi_state(u32 uval, str
>         return attach_to_pi_owner(uval, key, ps);
>  }
>  
> +static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
> +{
> +       u32 uninitialized_var(curval);
> +
> +       if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
> +               return -EFAULT;
> +
> +       /* If user space value changed, let the caller retry */
> +       return curval != uval ? -EAGAIN : 0;
> +}
> +
>  /**
>   * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
>   * @uaddr:             the pi futex user address
> @@ -979,113 +990,67 @@ static int futex_lock_pi_atomic(u32 __us
>                                 struct futex_pi_state **ps,
>                                 struct task_struct *task, int set_waiters)
>  {
> -       int lock_taken, ret, force_take = 0;
> -       u32 uval, newval, curval, vpid = task_pid_vnr(task);
> -
> -retry:
> -       ret = lock_taken = 0;
> +       u32 uval, newval, vpid = task_pid_vnr(task);
> +       struct futex_q *match;
> +       int ret;
>  
>         /*
> -        * To avoid races, we attempt to take the lock here again
> -        * (by doing a 0 -> TID atomic cmpxchg), while holding all
> -        * the locks. It will most likely not succeed.
> +        * Read the user space value first so we can validate a few
> +        * things before proceeding further.
>          */
> -       newval = vpid;
> -       if (set_waiters)
> -               newval |= FUTEX_WAITERS;
> -
> -       if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, 0, newval)))
> +       if (get_futex_value_locked(&uval, uaddr))
>                 return -EFAULT;
>  
>         /*
>          * Detect deadlocks.
>          */
> -       if ((unlikely((curval & FUTEX_TID_MASK) == vpid)))
> +       if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))
>                 return -EDEADLK;
>  
>         /*
> -        * Surprise - we got the lock, but we do not trust user space at all.

We really don't want to drop this comment (at leas not the latter
half ;-)

> +        * Lookup existing state first. If it exists, try to attach to
> +        * its pi_state.
>          */
> -       if (unlikely(!curval)) {
> -               /*
> -                * We verify whether there is kernel state for this
> -                * futex. If not, we can safely assume, that the 0 ->
> -                * TID transition is correct. If state exists, we do
> -                * not bother to fixup the user space state as it was
> -                * corrupted already.
> -                */
> -               return futex_top_waiter(hb, key) ? -EINVAL : 1;

On successful acquisition, we used to return 1...

> -       }
> -
> -       uval = curval;
> +       match = futex_top_waiter(hb, key);
> +       if (match)
> +               return attach_to_pi_state(uval, match->pi_state, ps);
>  
>         /*
> -        * Set the FUTEX_WAITERS flag, so the owner will know it has someone
> -        * to wake at the next unlock.
> +        * No waiter and user TID is 0. We are here because the
> +        * waiters or the owner died bit is set or called from
> +        * requeue_cmp_pi or for whatever reason something took the
> +        * syscall.
>          */
> -       newval = curval | FUTEX_WAITERS;
> -
> -       /*
> -        * Should we force take the futex? See below.
> -        */
> -       if (unlikely(force_take)) {
> +       if (!(uval & FUTEX_TID_MASK)) {
>                 /*
> -                * Keep the OWNER_DIED and the WAITERS bit and set the
> -                * new TID value.
> +                * We take over the futex. No other waiters and the user space
> +                * TID is 0. We preserve the owner died bit.
>                  */

Or userspace is screwing with us and set it to 0 for no good reason at
all... so there could still be a waiter queued up from
FUTEX_WAIT_REQUEUE_PI.

> -               newval = (curval & ~FUTEX_TID_MASK) | vpid;
> -               force_take = 0;
> -               lock_taken = 1;
> -       }
> +               newval = uval & FUTEX_OWNER_DIED;
> +               newval |= vpid;
>  
> -       if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
> -               return -EFAULT;
> -       if (unlikely(curval != uval))
> -               goto retry;
> +               /* The futex requeue_pi code can enforce the waiters bit */
> +               if (set_waiters)
> +                       newval |= FUTEX_WAITERS;
> +
> +               return lock_pi_update_atomic(uaddr, uval, newval);

And now we return 0, resulting in futex_proxy_trylock_atomic also
returning 0, but the pi_state is NULL, and as it doesn't return > 0
(vpid), we don't look it up again after atomic acquisition in
futex_requeue().

And BANG.

Consider the following:

>From 3eec2db2654a6cd8dcac3c60acda0f16a3243e29 Mon Sep 17 00:00:00 2001
From: Darren Hart <dvhart@linux.intel.com>
Date: Thu, 12 Jun 2014 22:41:32 -0700
Subject: [PATCH] futex: Invert the return value of lock_pi_update_atomic

Indicate success to the caller by returning 1 for lock acquired.

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
---
 kernel/futex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 391df53..82b96a4 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1033,7 +1033,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr,
struct futex_hash_bucket *hb,
 		if (set_waiters)
 			newval |= FUTEX_WAITERS;
 
-		return lock_pi_update_atomic(uaddr, uval, newval);
+		return !lock_pi_update_atomic(uaddr, uval, newval);
 	}
 
 	/*
-- 
2.0.0.rc2




--
Darren

> +       }
>  
>         /*
> -        * We took the lock due to forced take over.
> +        * First waiter. Set the waiters bit before attaching ourself to
> +        * the owner. If owner tries to unlock, it will be forced into
> +        * the kernel and blocked on hb->lock.
>          */
> -       if (unlikely(lock_taken))
> -               return 1;
> -
> +       newval = uval | FUTEX_WAITERS;
> +       ret = lock_pi_update_atomic(uaddr, uval, newval);
> +       if (ret)
> +               return ret;
>         /*
> -        * We dont have the lock. Look up the PI state (or create it if
> -        * we are the first waiter):
> +        * If the update of the user space value succeeded, we try to
> +        * attach to the owner. If that fails, no harm done, we only
> +        * set the FUTEX_WAITERS bit in the user space variable.
>          */
> -       ret = lookup_pi_state(uval, hb, key, ps);
> -
> -       if (unlikely(ret)) {
> -               switch (ret) {
> -               case -ESRCH:
> -                       /*
> -                        * We failed to find an owner for this
> -                        * futex. So we have no pi_state to block
> -                        * on. This can happen in two cases:
> -                        *
> -                        * 1) The owner died
> -                        * 2) A stale FUTEX_WAITERS bit
> -                        *
> -                        * Re-read the futex value.
> -                        */
> -                       if (get_futex_value_locked(&curval, uaddr))
> -                               return -EFAULT;
> -
> -                       /*
> -                        * If the owner died or we have a stale
> -                        * WAITERS bit the owner TID in the user space
> -                        * futex is 0.
> -                        */
> -                       if (!(curval & FUTEX_TID_MASK)) {
> -                               force_take = 1;
> -                               goto retry;
> -                       }
> -               default:
> -                       break;
> -               }
> -       }
> -
> -       return ret;
> +       return attach_to_pi_owner(uval, key, ps);
>  }
>  
>  /**
> @@ -2316,8 +2281,10 @@ retry_private:
>                         goto uaddr_faulted;
>                 case -EAGAIN:
>                         /*
> -                        * Task is exiting and we just wait for the
> -                        * exit to complete.
> +                        * Two reasons for this:
> +                        * - Task is exiting and we just wait for the
> +                        *   exit to complete.
> +                        * - The user space value changed.
>                          */
>                         queue_unlock(hb);
>                         put_futex_key(&q.key);
> 



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [patch 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust
  2014-06-13  5:46   ` Darren Hart
@ 2014-06-13  8:34     ` Thomas Gleixner
  2014-06-13  9:36       ` Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-13  8:34 UTC (permalink / raw)
  To: Darren Hart
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Thu, 12 Jun 2014, Darren Hart wrote:

> On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> > futex_lock_pi_atomic() is a maze of retry hoops and loops.
> > 
> > Reduce it to simple and understandable states:
> 
> Heh... well...
> 
> With this patch applied (1-4 will not reproduce without 5), if userspace
> wrongly sets the uval to 0, the pi_state can end up being NULL for a
> subsequent requeue_pi operation:
> 
> [   10.426159] requeue: 00000000006022e0 to 00000000006022e4
> [   10.427737]   this:ffff88013a637da8
> [   10.428749]   waking:ffff88013a637da8
> fut2 = 0
> [   10.429994]   comparing requeue_pi_key
> [   10.431034]   prepare waiter to take the rt_mutex
> [   10.432344]   pi_state:          (null)
> [   10.433414] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000038
> 
> This occurs in the requeue loop, in the requeue_pi block at:
> 
> 	atomic_inc(&pi_state->refcount);

Hmm. Took me some time to reproduce. Digging into it now.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust
  2014-06-13  8:34     ` Thomas Gleixner
@ 2014-06-13  9:36       ` Thomas Gleixner
  2014-06-13  9:44         ` [patch V2 " Thomas Gleixner
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-13  9:36 UTC (permalink / raw)
  To: Darren Hart
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Fri, 13 Jun 2014, Thomas Gleixner wrote:
> On Thu, 12 Jun 2014, Darren Hart wrote:
> 
> > On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> > > futex_lock_pi_atomic() is a maze of retry hoops and loops.
> > > 
> > > Reduce it to simple and understandable states:
> > 
> > Heh... well...
> > 
> > With this patch applied (1-4 will not reproduce without 5), if userspace
> > wrongly sets the uval to 0, the pi_state can end up being NULL for a
> > subsequent requeue_pi operation:
> > 
> > [   10.426159] requeue: 00000000006022e0 to 00000000006022e4
> > [   10.427737]   this:ffff88013a637da8
> > [   10.428749]   waking:ffff88013a637da8
> > fut2 = 0
> > [   10.429994]   comparing requeue_pi_key
> > [   10.431034]   prepare waiter to take the rt_mutex
> > [   10.432344]   pi_state:          (null)
> > [   10.433414] BUG: unable to handle kernel NULL pointer dereference at
> > 0000000000000038
> > 
> > This occurs in the requeue loop, in the requeue_pi block at:
> > 
> > 	atomic_inc(&pi_state->refcount);
> 
> Hmm. Took me some time to reproduce. Digging into it now.

I'm a moron. Ran out of brown paperbags ....

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [patch V2 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust
  2014-06-13  9:36       ` Thomas Gleixner
@ 2014-06-13  9:44         ` Thomas Gleixner
  2014-06-13 20:51           ` Davidlohr Bueso
                             ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-13  9:44 UTC (permalink / raw)
  To: Darren Hart
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

Subject: futex: Simplify futex_lock_pi_atomic() and make it more robust
From: Thomas Gleixner <tglx@linutronix.de>
Date: Wed, 11 Jun 2014 20:45:41 -0000

futex_lock_pi_atomic() is a maze of retry hoops and loops.

Reduce it to simple and understandable states:

First step is to lookup existing waiters (state) in the kernel.

If there is an existing waiter, validate it and attach to it.

If there is no existing waiter, check the user space value

If the TID encoded in the user space value is 0, take over the futex
preserving the owner died bit.

If the TID encoded in the user space value is != 0, lookup the owner
task, validate it and attach to it.

Reduces text size by 128 bytes on x8664.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.361836310@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---

V2: Fixed the brown paperbag bug of V1

 kernel/futex.c |  141 ++++++++++++++++++++++-----------------------------------
 1 file changed, 55 insertions(+), 86 deletions(-)

Index: linux/kernel/futex.c
===================================================================
--- linux.orig/kernel/futex.c
+++ linux/kernel/futex.c
@@ -956,6 +956,17 @@ static int lookup_pi_state(u32 uval, str
 	return attach_to_pi_owner(uval, key, ps);
 }
 
+static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
+{
+	u32 uninitialized_var(curval);
+
+	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
+		return -EFAULT;
+
+	/*If user space value changed, let the caller retry */
+	return curval != uval ? -EAGAIN : 0;
+}
+
 /**
  * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
  * @uaddr:		the pi futex user address
@@ -979,113 +990,69 @@ static int futex_lock_pi_atomic(u32 __us
 				struct futex_pi_state **ps,
 				struct task_struct *task, int set_waiters)
 {
-	int lock_taken, ret, force_take = 0;
-	u32 uval, newval, curval, vpid = task_pid_vnr(task);
-
-retry:
-	ret = lock_taken = 0;
+	u32 uval, newval, vpid = task_pid_vnr(task);
+	struct futex_q *match;
+	int ret;
 
 	/*
-	 * To avoid races, we attempt to take the lock here again
-	 * (by doing a 0 -> TID atomic cmpxchg), while holding all
-	 * the locks. It will most likely not succeed.
+	 * Read the user space value first so we can validate a few
+	 * things before proceeding further.
 	 */
-	newval = vpid;
-	if (set_waiters)
-		newval |= FUTEX_WAITERS;
-
-	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, 0, newval)))
+	if (get_futex_value_locked(&uval, uaddr))
 		return -EFAULT;
 
 	/*
 	 * Detect deadlocks.
 	 */
-	if ((unlikely((curval & FUTEX_TID_MASK) == vpid)))
+	if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))
 		return -EDEADLK;
 
 	/*
-	 * Surprise - we got the lock, but we do not trust user space at all.
+	 * Lookup existing state first. If it exists, try to attach to
+	 * its pi_state.
 	 */
-	if (unlikely(!curval)) {
-		/*
-		 * We verify whether there is kernel state for this
-		 * futex. If not, we can safely assume, that the 0 ->
-		 * TID transition is correct. If state exists, we do
-		 * not bother to fixup the user space state as it was
-		 * corrupted already.
-		 */
-		return futex_top_waiter(hb, key) ? -EINVAL : 1;
-	}
-
-	uval = curval;
+	match = futex_top_waiter(hb, key);
+	if (match)
+		return attach_to_pi_state(uval, match->pi_state, ps);
 
 	/*
-	 * Set the FUTEX_WAITERS flag, so the owner will know it has someone
-	 * to wake at the next unlock.
+	 * No waiter and user TID is 0. We are here because the
+	 * waiters or the owner died bit is set or called from
+	 * requeue_cmp_pi or for whatever reason something took the
+	 * syscall.
 	 */
-	newval = curval | FUTEX_WAITERS;
-
-	/*
-	 * Should we force take the futex? See below.
-	 */
-	if (unlikely(force_take)) {
+	if (!(uval & FUTEX_TID_MASK)) {
 		/*
-		 * Keep the OWNER_DIED and the WAITERS bit and set the
-		 * new TID value.
+		 * We take over the futex. No other waiters and the user space
+		 * TID is 0. We preserve the owner died bit.
 		 */
-		newval = (curval & ~FUTEX_TID_MASK) | vpid;
-		force_take = 0;
-		lock_taken = 1;
-	}
+		newval = uval & FUTEX_OWNER_DIED;
+		newval |= vpid;
 
-	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
-		return -EFAULT;
-	if (unlikely(curval != uval))
-		goto retry;
+		/* The futex requeue_pi code can enforce the waiters bit */
+		if (set_waiters)
+			newval |= FUTEX_WAITERS;
+
+		ret = lock_pi_update_atomic(uaddr, uval, newval);
+		/* If the take over worked, return 1 */
+		return ret < 0 ? ret : 1;
+	}
 
 	/*
-	 * We took the lock due to forced take over.
+	 * First waiter. Set the waiters bit before attaching ourself to
+	 * the owner. If owner tries to unlock, it will be forced into
+	 * the kernel and blocked on hb->lock.
 	 */
-	if (unlikely(lock_taken))
-		return 1;
-
+	newval = uval | FUTEX_WAITERS;
+	ret = lock_pi_update_atomic(uaddr, uval, newval);
+	if (ret)
+		return ret;
 	/*
-	 * We dont have the lock. Look up the PI state (or create it if
-	 * we are the first waiter):
+	 * If the update of the user space value succeeded, we try to
+	 * attach to the owner. If that fails, no harm done, we only
+	 * set the FUTEX_WAITERS bit in the user space variable.
 	 */
-	ret = lookup_pi_state(uval, hb, key, ps);
-
-	if (unlikely(ret)) {
-		switch (ret) {
-		case -ESRCH:
-			/*
-			 * We failed to find an owner for this
-			 * futex. So we have no pi_state to block
-			 * on. This can happen in two cases:
-			 *
-			 * 1) The owner died
-			 * 2) A stale FUTEX_WAITERS bit
-			 *
-			 * Re-read the futex value.
-			 */
-			if (get_futex_value_locked(&curval, uaddr))
-				return -EFAULT;
-
-			/*
-			 * If the owner died or we have a stale
-			 * WAITERS bit the owner TID in the user space
-			 * futex is 0.
-			 */
-			if (!(curval & FUTEX_TID_MASK)) {
-				force_take = 1;
-				goto retry;
-			}
-		default:
-			break;
-		}
-	}
-
-	return ret;
+	return attach_to_pi_owner(uval, key, ps);
 }
 
 /**
@@ -2316,8 +2283,10 @@ retry_private:
 			goto uaddr_faulted;
 		case -EAGAIN:
 			/*
-			 * Task is exiting and we just wait for the
-			 * exit to complete.
+			 * Two reasons for this:
+			 * - Task is exiting and we just wait for the
+			 *   exit to complete.
+			 * - The user space value changed.
 			 */
 			queue_unlock(hb);
 			put_futex_key(&q.key);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust
  2014-06-13  9:44         ` [patch V2 " Thomas Gleixner
@ 2014-06-13 20:51           ` Davidlohr Bueso
  2014-06-16 20:36           ` Darren Hart
  2014-06-21 20:34           ` [tip:locking/core] " tip-bot for Thomas Gleixner
  2 siblings, 0 replies; 26+ messages in thread
From: Davidlohr Bueso @ 2014-06-13 20:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Darren Hart, LKML, Peter Zijlstra, Ingo Molnar, Kees Cook, wad

On Fri, 2014-06-13 at 11:44 +0200, Thomas Gleixner wrote:
> Subject: futex: Simplify futex_lock_pi_atomic() and make it more robust
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 11 Jun 2014 20:45:41 -0000
> 
> futex_lock_pi_atomic() is a maze of retry hoops and loops.
> 
> Reduce it to simple and understandable states:
> 
> First step is to lookup existing waiters (state) in the kernel.
> 
> If there is an existing waiter, validate it and attach to it.
> 
> If there is no existing waiter, check the user space value
> 
> If the TID encoded in the user space value is 0, take over the futex
> preserving the owner died bit.
> 
> If the TID encoded in the user space value is != 0, lookup the owner
> task, validate it and attach to it.
> 
> Reduces text size by 128 bytes on x8664.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Darren Hart <darren@dvhart.com>
> Cc: Davidlohr Bueso <davidlohr@hp.com>
> Cc: Kees Cook <kees@outflux.net>
> Cc: wad@chromium.org
> Link: http://lkml.kernel.org/r/20140611204237.361836310@linutronix.de
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> 
> V2: Fixed the brown paperbag bug of V1

I confirm this v2 fixes the issue. Passes 5 hr pounding on my 80-core
system. Unsurprisingly, I didn't see any performance regressions either.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 1/5] futex: Make unlock_pi more robust
  2014-06-11 20:45 ` [patch 1/5] futex: Make unlock_pi more robust Thomas Gleixner
@ 2014-06-16 16:18   ` Darren Hart
  2014-06-16 22:15     ` Thomas Gleixner
  2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
  1 sibling, 1 reply; 26+ messages in thread
From: Darren Hart @ 2014-06-16 16:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> The kernel tries to atomically unlock the futex without checking
> whether there is kernel state associated to the futex.
> 
> So if user space manipulated the user space value, this will leave
> kernel internal state around associated to the owner task. 
> 
> For robustness sake, lookup first whether there are waiters on the
> futex. If there are waiters, wake the top priority waiter with all the
> proper sanity checks applied.
> 
> If there are no waiters, do the atomic release. We do not have to
> preserve the waiters bit in this case, because a potentially incoming
> waiter is blocked on the hb->lock and will acquire the futex
> atomically. We neither have to preserve the owner died bit. The caller
> is the owner and it was supposed to cleanup the mess.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/futex.c |   78
> +++++++++++++++++++--------------------------------------
>  1 file changed, 26 insertions(+), 52 deletions(-)
> 
> Index: linux/kernel/futex.c
> ===================================================================
> --- linux.orig/kernel/futex.c
> +++ linux/kernel/futex.c
> @@ -1186,22 +1186,6 @@ static int wake_futex_pi(u32 __user *uad
>         return 0;
>  }
>  
> -static int unlock_futex_pi(u32 __user *uaddr, u32 uval)
> -{
> -       u32 uninitialized_var(oldval);
> -
> -       /*
> -        * There is no waiter, so we unlock the futex. The owner died
> -        * bit has not to be preserved here. We are the owner:
> -        */
> -       if (cmpxchg_futex_value_locked(&oldval, uaddr, uval, 0))
> -               return -EFAULT;
> -       if (oldval != uval)
> -               return -EAGAIN;
> -
> -       return 0;
> -}

I thought we used that in multiple places, but apparently not. Worth
consolidating.

> -
>  /*
>   * Express the locking dependencies for lockdep:
>   */
> @@ -2401,10 +2385,10 @@ uaddr_faulted:
>   */
>  static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
>  {
> -       struct futex_hash_bucket *hb;
> -       struct futex_q *this, *next;
> +       u32 uninitialized_var(curval), uval, vpid =
> task_pid_vnr(current);
>         union futex_key key = FUTEX_KEY_INIT;
> -       u32 uval, vpid = task_pid_vnr(current);
> +       struct futex_hash_bucket *hb;
> +       struct futex_q *match;
>         int ret;
>  
>  retry:
> @@ -2417,57 +2401,47 @@ retry:
>                 return -EPERM;
>  
>         ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key,
> VERIFY_WRITE);
> -       if (unlikely(ret != 0))
> -               goto out;
> +       if (ret)
> +               return ret;

Looks like you're also trying to move away from a single exit point to
multiple exit points. I prefer the single exit (which you've probably
noticed :-), but it's a subjective thing, and so long as we are not
duplicating cleanup logic, I guess it's fine either way. This change was
not mentioned in the commit message though.

>  
>         hb = hash_futex(&key);
>         spin_lock(&hb->lock);
>  
>         /*
> -        * To avoid races, try to do the TID -> 0 atomic transition
> -        * again. If it succeeds then we can return without waking
> -        * anyone else up. We only try this if neither the waiters nor
> -        * the owner died bit are set.
> -        */
> -       if (!(uval & ~FUTEX_TID_MASK) &&
> -           cmpxchg_futex_value_locked(&uval, uaddr, vpid, 0))
> -               goto pi_faulted;
> -       /*
> -        * Rare case: we managed to release the lock atomically,
> -        * no need to wake anyone else up:
> -        */
> -       if (unlikely(uval == vpid))
> -               goto out_unlock;
> -
> -       /*
> -        * Ok, other tasks may need to be woken up - check waiters
> -        * and do the wakeup if necessary:
> -        */
> -       plist_for_each_entry_safe(this, next, &hb->chain, list) {
> -               if (!match_futex (&this->key, &key))
> -                       continue;
> -               ret = wake_futex_pi(uaddr, uval, this);
> +        * Check waiters first. We do not trust user space values at
> +        * all and we at least want to know if user space fiddled
> +        * with the futex value instead of blindly unlocking.
> +        */
> +       match = futex_top_waiter(hb, &key);
> +       if (match) {
> +               ret = wake_futex_pi(uaddr, uval, match);
>                 /*
> -                * The atomic access to the futex value
> -                * generated a pagefault, so retry the
> -                * user-access and the wakeup:
> +                * The atomic access to the futex value generated a
> +                * pagefault, so retry the user-access and the wakeup:
>                  */
>                 if (ret == -EFAULT)
>                         goto pi_faulted;
>                 goto out_unlock;
>         }
> +
>         /*
> -        * No waiters - kernel unlocks the futex:
> +        * We have no kernel internal state, i.e. no waiters in the
> +        * kernel. Waiters which are about to queue themself are stuck

themselves

> +        * on hb->lock. So we can safely ignore them. We do neither
> +        * preserve the WAITERS bit not the OWNER_DIED one. We are the

We preserve neither the WAITERS bit nor the OWNER_DIED bit.
(the above use of "do" and "not" is incorrect and could easily be
misinterpreted).

> +        * owner.

In wake_futex_pi we verify ownership by matching pi_state->owner ==
current, but here the only test is the TID value, which is set by
userspace - which we don't trust...

I'm trying to determine if it matters in this case... if there are no
waiters, is the pi_state still around? If so, it does indeed matter, and
we should be verifying.


>          */
> -       ret = unlock_futex_pi(uaddr, uval);
> -       if (ret == -EFAULT)
> +       if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))
>                 goto pi_faulted;
>  

This refactoring seems like it would be best done as a prequel patch so
as not to confuse cleanup with functional change. At least that is what
you and others have beaten into me over the years ;-)

> +       /*
> +        * If uval has changed, let user space handle it.
> +        */
> +       ret = (curval == uval) ? 0 : -EAGAIN;
> +
>  out_unlock:
>         spin_unlock(&hb->lock);
>         put_futex_key(&key);
> -
> -out:
>         return ret;
>  

By dropping this you won't return ret, but rather fall through into
pi_faulted... which certainly isn't what you wanted.

The need for better test coverage is very evident now :-)

--
Darren

>  pi_faulted:
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 2/5] futex: Use futex_top_waiter() in lookup_pi_state()
  2014-06-11 20:45 ` [patch 2/5] futex: Use futex_top_waiter() in lookup_pi_state() Thomas Gleixner
@ 2014-06-16 16:51   ` Darren Hart
  2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 26+ messages in thread
From: Darren Hart @ 2014-06-16 16:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> No point in open coding the same function again.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  kernel/futex.c |  128
> ++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 63 insertions(+), 65 deletions(-)
> 
> Index: linux/kernel/futex.c
> ===================================================================
> --- linux.orig/kernel/futex.c
> +++ linux/kernel/futex.c
> @@ -796,87 +796,85 @@ static int
>  lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
>                 union futex_key *key, struct futex_pi_state **ps)
>  {
> +       struct futex_q *match = futex_top_waiter(hb, key);
>         struct futex_pi_state *pi_state = NULL;
> -       struct futex_q *this, *next;
>         struct task_struct *p;
>         pid_t pid = uval & FUTEX_TID_MASK;
>  
> -       plist_for_each_entry_safe(this, next, &hb->chain, list) {
> -               if (match_futex(&this->key, key)) {
> +       if (match) {
> +               /*
> +                * Sanity check the waiter before increasing the
> +                * refcount and attaching to it.
> +                */
> +               pi_state = match->pi_state;
> +               /*
> +                * Userspace might have messed up non-PI and PI
> +                * futexes [3]
> +                */
> +               if (unlikely(!pi_state))
> +                       return -EINVAL;
> +
> +               WARN_ON(!atomic_read(&pi_state->refcount));
> +
> +               /*
> +                * Handle the owner died case:
> +                */
> +               if (uval & FUTEX_OWNER_DIED) {
>                         /*
> -                        * Sanity check the waiter before increasing
> -                        * the refcount and attaching to it.
> +                        * exit_pi_state_list sets owner to NULL and
> +                        * wakes the topmost waiter. The task which
> +                        * acquires the pi_state->rt_mutex will fixup
> +                        * owner.
>                          */
> -                       pi_state = this->pi_state;
> -                       /*
> -                        * Userspace might have messed up non-PI and
> -                        * PI futexes [3]
> -                        */
> -                       if (unlikely(!pi_state))
> -                               return -EINVAL;
> -
> -                       WARN_ON(!atomic_read(&pi_state->refcount));
> -
> -                       /*
> -                        * Handle the owner died case:
> -                        */
> -                       if (uval & FUTEX_OWNER_DIED) {
> -                               /*
> -                                * exit_pi_state_list sets owner to
> NULL and
> -                                * wakes the topmost waiter. The task
> which
> -                                * acquires the pi_state->rt_mutex
> will fixup
> -                                * owner.
> -                                */
> -                               if (!pi_state->owner) {
> -                                       /*
> -                                        * No pi state owner, but the
> user
> -                                        * space TID is not 0.
> Inconsistent
> -                                        * state. [5]
> -                                        */
> -                                       if (pid)
> -                                               return -EINVAL;
> -                                       /*
> -                                        * Take a ref on the state and
> -                                        * return. [4]
> -                                        */
> -                                       goto out_state;
> -                               }
> -
> +                       if (!pi_state->owner) {
>                                 /*
> -                                * If TID is 0, then either the dying
> owner
> -                                * has not yet executed
> exit_pi_state_list()
> -                                * or some waiter acquired the rtmutex
> in the
> -                                * pi state, but did not yet fixup the
> TID in
> -                                * user space.
> -                                *
> -                                * Take a ref on the state and return.
> [6]
> +                                * No pi state owner, but the user
> +                                * space TID is not 0. Inconsistent
> +                                * state. [5]
>                                  */
> -                               if (!pid)
> -                                       goto out_state;
> -                       } else {
> +                               if (pid)
> +                                       return -EINVAL;
>                                 /*
> -                                * If the owner died bit is not set,
> -                                * then the pi_state must have an
> -                                * owner. [7]
> +                                * Take a ref on the state and
> +                                * return. [4]
>                                  */
> -                               if (!pi_state->owner)
> -                                       return -EINVAL;
> +                               goto out_state;
>                         }
>  
>                         /*
> -                        * Bail out if user space manipulated the
> -                        * futex value. If pi state exists then the
> -                        * owner TID must be the same as the user
> -                        * space TID. [9/10]
> +                        * If TID is 0, then either the dying owner
> +                        * has not yet executed exit_pi_state_list()
> +                        * or some waiter acquired the rtmutex in the
> +                        * pi state, but did not yet fixup the TID in
> +                        * user space.
> +                        *
> +                        * Take a ref on the state and return. [6]
>                          */
> -                       if (pid != task_pid_vnr(pi_state->owner))
> +                       if (!pid)
> +                               goto out_state;
> +               } else {
> +                       /*
> +                        * If the owner died bit is not set,
> +                        * then the pi_state must have an
> +                        * owner. [7]
> +                        */
> +                       if (!pi_state->owner)
>                                 return -EINVAL;
> -
> -               out_state:
> -                       atomic_inc(&pi_state->refcount);
> -                       *ps = pi_state;
> -                       return 0;
>                 }
> +
> +               /*
> +                * Bail out if user space manipulated the
> +                * futex value. If pi state exists then the
> +                * owner TID must be the same as the user
> +                * space TID. [9/10]
> +                */
> +               if (pid != task_pid_vnr(pi_state->owner))
> +                       return -EINVAL;
> +
> +       out_state:
> +               atomic_inc(&pi_state->refcount);
> +               *ps = pi_state;
> +               return 0;


Surely there is a better tool for viewing such "shift left one tab"
patches that don't make one track three if blocks in parallel in one's
head to ensure they all assemble back to the same thing? What do other
people use? Ouch...

Reviewed-by: Darren Hart <darren@dvhart.com>

--
Darren Hart
Intel Open Source Technology Center


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 3/5] futex: Split out the waiter check from lookup_pi_state()
  2014-06-11 20:45 ` [patch 3/5] futex: Split out the waiter check from lookup_pi_state() Thomas Gleixner
@ 2014-06-16 18:12   ` Darren Hart
  2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 26+ messages in thread
From: Darren Hart @ 2014-06-16 18:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> We want to be a bit more clever in futex_lock_pi_atomic() and separate
> the possible states. Split out the waiter verification into a separate
> function. No functional change.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed for functional equivalence.

Reviewed-by: Darren Hart <darren@dvhart.com>


-- 
Darren Hart
Intel Open Source Technology Center



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 4/5] futex: Split out the first waiter attachment from lookup_pi_state()
  2014-06-11 20:45 ` [patch 4/5] futex: Split out the first waiter attachment from lookup_pi_state() Thomas Gleixner
@ 2014-06-16 18:19   ` Darren Hart
  2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 26+ messages in thread
From: Darren Hart @ 2014-06-16 18:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> We want to be a bit more clever in futex_lock_pi_atomic() and separate
> the possible states. Split out the code which attaches the first
> waiter to the owner into a separate function. No functional change.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

Reviewed for functional equivalence:

Reviewed-by: Darren Hart <darren@dvhart.com>

-- 
Darren Hart
Intel Open Source Technology Center



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust
  2014-06-13  9:44         ` [patch V2 " Thomas Gleixner
  2014-06-13 20:51           ` Davidlohr Bueso
@ 2014-06-16 20:36           ` Darren Hart
  2014-06-17  7:20             ` Thomas Gleixner
  2014-06-21 20:34           ` [tip:locking/core] " tip-bot for Thomas Gleixner
  2 siblings, 1 reply; 26+ messages in thread
From: Darren Hart @ 2014-06-16 20:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Fri, 2014-06-13 at 11:44 +0200, Thomas Gleixner wrote:
> Subject: futex: Simplify futex_lock_pi_atomic() and make it more robust
> From: Thomas Gleixner <tglx@linutronix.de>
> Date: Wed, 11 Jun 2014 20:45:41 -0000
> 
> futex_lock_pi_atomic() is a maze of retry hoops and loops.
> 
> Reduce it to simple and understandable states:
> 
> First step is to lookup existing waiters (state) in the kernel.
> 
> If there is an existing waiter, validate it and attach to it.
> 
> If there is no existing waiter, check the user space value
> 
> If the TID encoded in the user space value is 0, take over the futex
> preserving the owner died bit.
> 
> If the TID encoded in the user space value is != 0, lookup the owner
> task, validate it and attach to it.
> 
> Reduces text size by 128 bytes on x8664.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Darren Hart <darren@dvhart.com>
> Cc: Davidlohr Bueso <davidlohr@hp.com>
> Cc: Kees Cook <kees@outflux.net>
> Cc: wad@chromium.org
> Link: http://lkml.kernel.org/r/20140611204237.361836310@linutronix.de
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
> 
> V2: Fixed the brown paperbag bug of V1
> 
>  kernel/futex.c |  141 ++++++++++++++++++++++-----------------------------------
>  1 file changed, 55 insertions(+), 86 deletions(-)
> 
> Index: linux/kernel/futex.c
> ===================================================================
> --- linux.orig/kernel/futex.c
> +++ linux/kernel/futex.c
> @@ -956,6 +956,17 @@ static int lookup_pi_state(u32 uval, str
>  	return attach_to_pi_owner(uval, key, ps);
>  }
>  
> +static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
> +{
> +	u32 uninitialized_var(curval);
> +
> +	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
> +		return -EFAULT;
> +
> +	/*If user space value changed, let the caller retry */
> +	return curval != uval ? -EAGAIN : 0;
> +}

Given the complexity of this update and how fragile this path can be, I
think this refactoring would be best done in an independent patch, as
you did with the previous two.

Two general concerns, we appear to be eliminating both the force_take
and the retry.

The force_take only occurs if TID==0, and that is covered here in a
cleaner way, so I believe we are good here.

As for the retry, the remaining use case (outside of TID==0 ->
force_take=1 -> retry) appears to be that userspace changed the value
while we were running. Reading the value early doesn't protect us from
this scenario. How does this change account for that?

It looks to me that before we would retry, while here we just give up
and return -EAGAIN..., which is addressed in futex_lock_pi(), but not in
the futex_requeue() callsite for futex_proxy_trylock_atomic. It does
handle it, but I guess also needs a comment update to "The owner was
exiting" to include "or userspace changed the value" as you did for
futex_lock_pi().

>From my analysis, this is a good cleanup and makes the code for
explicit. I'm nervous about missing corner cases, and would like to
understand what level of testing this has received. We need to add PI
locking tests to futextest. There are some in glibc. Which tests were
run to validate PI locking?

Thanks,

Darren Hart
> +
>  /**
>   * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
>   * @uaddr:		the pi futex user address
> @@ -979,113 +990,69 @@ static int futex_lock_pi_atomic(u32 __us
>  				struct futex_pi_state **ps,
>  				struct task_struct *task, int set_waiters)
>  {
> -	int lock_taken, ret, force_take = 0;
> -	u32 uval, newval, curval, vpid = task_pid_vnr(task);
> -
> -retry:
> -	ret = lock_taken = 0;
> +	u32 uval, newval, vpid = task_pid_vnr(task);
> +	struct futex_q *match;
> +	int ret;
>  
>  	/*
> -	 * To avoid races, we attempt to take the lock here again
> -	 * (by doing a 0 -> TID atomic cmpxchg), while holding all
> -	 * the locks. It will most likely not succeed.
> +	 * Read the user space value first so we can validate a few
> +	 * things before proceeding further.
>  	 */
> -	newval = vpid;
> -	if (set_waiters)
> -		newval |= FUTEX_WAITERS;
> -
> -	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, 0, newval)))
> +	if (get_futex_value_locked(&uval, uaddr))
>  		return -EFAULT;
>  
>  	/*
>  	 * Detect deadlocks.
>  	 */
> -	if ((unlikely((curval & FUTEX_TID_MASK) == vpid)))
> +	if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))
>  		return -EDEADLK;
>  
>  	/*
> -	 * Surprise - we got the lock, but we do not trust user space at all.
> +	 * Lookup existing state first. If it exists, try to attach to
> +	 * its pi_state.
>  	 */
> -	if (unlikely(!curval)) {
> -		/*
> -		 * We verify whether there is kernel state for this
> -		 * futex. If not, we can safely assume, that the 0 ->
> -		 * TID transition is correct. If state exists, we do
> -		 * not bother to fixup the user space state as it was
> -		 * corrupted already.
> -		 */
> -		return futex_top_waiter(hb, key) ? -EINVAL : 1;
> -	}
> -
> -	uval = curval;
> +	match = futex_top_waiter(hb, key);
> +	if (match)
> +		return attach_to_pi_state(uval, match->pi_state, ps);
>  
>  	/*
> -	 * Set the FUTEX_WAITERS flag, so the owner will know it has someone
> -	 * to wake at the next unlock.
> +	 * No waiter and user TID is 0. We are here because the
> +	 * waiters or the owner died bit is set or called from
> +	 * requeue_cmp_pi or for whatever reason something took the
> +	 * syscall.
>  	 */
> -	newval = curval | FUTEX_WAITERS;
> -
> -	/*
> -	 * Should we force take the futex? See below.
> -	 */
> -	if (unlikely(force_take)) {
> +	if (!(uval & FUTEX_TID_MASK)) {
>  		/*
> -		 * Keep the OWNER_DIED and the WAITERS bit and set the
> -		 * new TID value.
> +		 * We take over the futex. No other waiters and the user space
> +		 * TID is 0. We preserve the owner died bit.
>  		 */
> -		newval = (curval & ~FUTEX_TID_MASK) | vpid;
> -		force_take = 0;
> -		lock_taken = 1;
> -	}
> +		newval = uval & FUTEX_OWNER_DIED;
> +		newval |= vpid;
>  
> -	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
> -		return -EFAULT;
> -	if (unlikely(curval != uval))
> -		goto retry;
> +		/* The futex requeue_pi code can enforce the waiters bit */
> +		if (set_waiters)
> +			newval |= FUTEX_WAITERS;
> +
> +		ret = lock_pi_update_atomic(uaddr, uval, newval);
> +		/* If the take over worked, return 1 */
> +		return ret < 0 ? ret : 1;
> +	}
>  
>  	/*
> -	 * We took the lock due to forced take over.
> +	 * First waiter. Set the waiters bit before attaching ourself to
> +	 * the owner. If owner tries to unlock, it will be forced into
> +	 * the kernel and blocked on hb->lock.
>  	 */
> -	if (unlikely(lock_taken))
> -		return 1;
> -
> +	newval = uval | FUTEX_WAITERS;
> +	ret = lock_pi_update_atomic(uaddr, uval, newval);
> +	if (ret)
> +		return ret;
>  	/*
> -	 * We dont have the lock. Look up the PI state (or create it if
> -	 * we are the first waiter):
> +	 * If the update of the user space value succeeded, we try to
> +	 * attach to the owner. If that fails, no harm done, we only
> +	 * set the FUTEX_WAITERS bit in the user space variable.
>  	 */
> -	ret = lookup_pi_state(uval, hb, key, ps);
> -
> -	if (unlikely(ret)) {
> -		switch (ret) {
> -		case -ESRCH:
> -			/*
> -			 * We failed to find an owner for this
> -			 * futex. So we have no pi_state to block
> -			 * on. This can happen in two cases:
> -			 *
> -			 * 1) The owner died
> -			 * 2) A stale FUTEX_WAITERS bit
> -			 *
> -			 * Re-read the futex value.
> -			 */
> -			if (get_futex_value_locked(&curval, uaddr))
> -				return -EFAULT;
> -
> -			/*
> -			 * If the owner died or we have a stale
> -			 * WAITERS bit the owner TID in the user space
> -			 * futex is 0.
> -			 */
> -			if (!(curval & FUTEX_TID_MASK)) {
> -				force_take = 1;
> -				goto retry;
> -			}
> -		default:
> -			break;
> -		}
> -	}
> -
> -	return ret;
> +	return attach_to_pi_owner(uval, key, ps);
>  }
>  
>  /**
> @@ -2316,8 +2283,10 @@ retry_private:
>  			goto uaddr_faulted;
>  		case -EAGAIN:
>  			/*
> -			 * Task is exiting and we just wait for the
> -			 * exit to complete.
> +			 * Two reasons for this:
> +			 * - Task is exiting and we just wait for the
> +			 *   exit to complete.
> +			 * - The user space value changed.
>  			 */
>  			queue_unlock(hb);
>  			put_futex_key(&q.key);

-- 
Darren Hart
Intel Open Source Technology Center



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 1/5] futex: Make unlock_pi more robust
  2014-06-16 16:18   ` Darren Hart
@ 2014-06-16 22:15     ` Thomas Gleixner
  2014-06-16 22:28       ` Thomas Gleixner
  2014-06-16 22:39       ` Darren Hart
  0 siblings, 2 replies; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-16 22:15 UTC (permalink / raw)
  To: Darren Hart
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Mon, 16 Jun 2014, Darren Hart wrote:
> On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> >  static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
> > @@ -2417,57 +2401,47 @@ retry:
> >                 return -EPERM;
> >  
> >         ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key,
> > VERIFY_WRITE);
> > -       if (unlikely(ret != 0))
> > -               goto out;
> > +       if (ret)
> > +               return ret;
> 
> Looks like you're also trying to move away from a single exit point to
> multiple exit points. I prefer the single exit (which you've probably
> noticed :-), but it's a subjective thing, and so long as we are not
> duplicating cleanup logic, I guess it's fine either way. This change was
> not mentioned in the commit message though.

I really did not think about mentioning that :)
 
> > +        * Check waiters first. We do not trust user space values at
> > +        * all and we at least want to know if user space fiddled
> > +        * with the futex value instead of blindly unlocking.
> > +        */
> > +       match = futex_top_waiter(hb, &key);
> > +       if (match) {
> > +               ret = wake_futex_pi(uaddr, uval, match);
> >                 /*
> > -                * The atomic access to the futex value
> > -                * generated a pagefault, so retry the
> > -                * user-access and the wakeup:
> > +                * The atomic access to the futex value generated a
> > +                * pagefault, so retry the user-access and the wakeup:
> >                  */
> >                 if (ret == -EFAULT)
> >                         goto pi_faulted;
> >                 goto out_unlock;
> >         }
> > +
> >         /*
> > -        * No waiters - kernel unlocks the futex:
> > +        * We have no kernel internal state, i.e. no waiters in the
> > +        * kernel. Waiters which are about to queue themself are stuck
> 
> themselves
> 
> > +        * on hb->lock. So we can safely ignore them. We do neither
> > +        * preserve the WAITERS bit not the OWNER_DIED one. We are the
> 
> We preserve neither the WAITERS bit nor the OWNER_DIED bit.
> (the above use of "do" and "not" is incorrect and could easily be
> misinterpreted).
> 
> > +        * owner.
> 
> In wake_futex_pi we verify ownership by matching pi_state->owner ==
> current, but here the only test is the TID value, which is set by
> userspace - which we don't trust...
> 
> I'm trying to determine if it matters in this case... if there are no
> waiters, is the pi_state still around? If so, it does indeed matter, and
> we should be verifying.

Erm. The whole point of this patch is to do:

     - Find existing state first and handle it.

     - If no state exists and TID == current, take it

     - Otherwise create state

This all happens under hb->lock. So how should something create new
state after we looked up existing state?
 
> >          */
> > -       ret = unlock_futex_pi(uaddr, uval);
> > -       if (ret == -EFAULT)
> > +       if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))
> >                 goto pi_faulted;
> >  
> 
> This refactoring seems like it would be best done as a prequel patch so
> as not to confuse cleanup with functional change. At least that is what
> you and others have beaten into me over the years ;-)

Well, yes and no. I'll hapilly discuss that without after clarifying
the issue below.

> > +       /*
> > +        * If uval has changed, let user space handle it.
> > +        */
> > +       ret = (curval == uval) ? 0 : -EAGAIN;
> > +
> >  out_unlock:
> >         spin_unlock(&hb->lock);
> >         put_futex_key(&key);
> > -
> > -out:
> >         return ret;
> >  
> 
> By dropping this you won't return ret, but rather fall through into
> pi_faulted... which certainly isn't what you wanted.

By dropping the now unused "out" label I'm not longer returning ret?
 
The resulting code is:

out_unlock:
	spin_unlock(&hb->lock);
	put_futex_key(&key);
	return ret;

If you can explain me how that "return ret" falls through to
pi_faulted magically, then I'm definitely agreeing with you on this:

> The need for better test coverage is very evident now :-)

  -ENOTENOUGHSLEEP or -ENOTENOUGHCOFFEE or -ENOGLASSES perhaps?

I'm omitting some other politicially incorrect speculations for now.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 1/5] futex: Make unlock_pi more robust
  2014-06-16 22:15     ` Thomas Gleixner
@ 2014-06-16 22:28       ` Thomas Gleixner
  2014-06-16 22:49         ` Darren Hart
  2014-06-16 22:39       ` Darren Hart
  1 sibling, 1 reply; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-16 22:28 UTC (permalink / raw)
  To: Darren Hart
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Tue, 17 Jun 2014, Thomas Gleixner wrote:
> On Mon, 16 Jun 2014, Darren Hart wrote:
> > On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> > In wake_futex_pi we verify ownership by matching pi_state->owner ==
> > current, but here the only test is the TID value, which is set by
> > userspace - which we don't trust...
> > 
> > I'm trying to determine if it matters in this case... if there are no
> > waiters, is the pi_state still around? If so, it does indeed matter, and
> > we should be verifying.
> 
> Erm. The whole point of this patch is to do:
> 
>      - Find existing state first and handle it.
> 
>      - If no state exists and TID == current, take it
> 
>      - Otherwise create state

Duh, that was the lock path. But here the point is:

      - Find existing state first and handle it.

      - If no state exists and TID == current, release it

The retry is obvious, right?

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 1/5] futex: Make unlock_pi more robust
  2014-06-16 22:15     ` Thomas Gleixner
  2014-06-16 22:28       ` Thomas Gleixner
@ 2014-06-16 22:39       ` Darren Hart
  1 sibling, 0 replies; 26+ messages in thread
From: Darren Hart @ 2014-06-16 22:39 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Tue, 2014-06-17 at 00:15 +0200, Thomas Gleixner wrote:
> On Mon, 16 Jun 2014, Darren Hart wrote:
> > On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> > >  static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
> > > @@ -2417,57 +2401,47 @@ retry:
> > >                 return -EPERM;
> > >  
> > >         ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key,
> > > VERIFY_WRITE);
> > > -       if (unlikely(ret != 0))
> > > -               goto out;
> > > +       if (ret)
> > > +               return ret;
> > 
> > Looks like you're also trying to move away from a single exit point to
> > multiple exit points. I prefer the single exit (which you've probably
> > noticed :-), but it's a subjective thing, and so long as we are not
> > duplicating cleanup logic, I guess it's fine either way. This change was
> > not mentioned in the commit message though.
> 
> I really did not think about mentioning that :)
>  
> > > +        * Check waiters first. We do not trust user space values at
> > > +        * all and we at least want to know if user space fiddled
> > > +        * with the futex value instead of blindly unlocking.
> > > +        */
> > > +       match = futex_top_waiter(hb, &key);
> > > +       if (match) {
> > > +               ret = wake_futex_pi(uaddr, uval, match);
> > >                 /*
> > > -                * The atomic access to the futex value
> > > -                * generated a pagefault, so retry the
> > > -                * user-access and the wakeup:
> > > +                * The atomic access to the futex value generated a
> > > +                * pagefault, so retry the user-access and the wakeup:
> > >                  */
> > >                 if (ret == -EFAULT)
> > >                         goto pi_faulted;
> > >                 goto out_unlock;
> > >         }
> > > +
> > >         /*
> > > -        * No waiters - kernel unlocks the futex:
> > > +        * We have no kernel internal state, i.e. no waiters in the
> > > +        * kernel. Waiters which are about to queue themself are stuck
> > 
> > themselves
> > 
> > > +        * on hb->lock. So we can safely ignore them. We do neither
> > > +        * preserve the WAITERS bit not the OWNER_DIED one. We are the
> > 
> > We preserve neither the WAITERS bit nor the OWNER_DIED bit.
> > (the above use of "do" and "not" is incorrect and could easily be
> > misinterpreted).
> > 
> > > +        * owner.
> > 
> > In wake_futex_pi we verify ownership by matching pi_state->owner ==
> > current, but here the only test is the TID value, which is set by
> > userspace - which we don't trust...
> > 
> > I'm trying to determine if it matters in this case... if there are no
> > waiters, is the pi_state still around? If so, it does indeed matter, and
> > we should be verifying.
> 
> Erm. The whole point of this patch is to do:
> 
>      - Find existing state first and handle it.hehheh
> 
>      - If no state exists and TID == current, take it
> 
>      - Otherwise create state

Right, by 5/5 I realized that, duh, if there are no waiters, you can't
find a pi_state to check, so we have no choice here. Sorry, should have
come back and said as much. We can drop this concern.

> This all happens under hb->lock. So how should something create new
> state after we looked up existing state?
>  
> > >          */
> > > -       ret = unlock_futex_pi(uaddr, uval);
> > > -       if (ret == -EFAULT)
> > > +       if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))
> > >                 goto pi_faulted;
> > >  
> > 
> > This refactoring seems like it would be best done as a prequel patch so
> > as not to confuse cleanup with functional change. At least that is what
> > you and others have beaten into me over the years ;-)
> 
> Well, yes and no. I'll hapilly discuss that without after clarifying
> the issue below.
> 
> > > +       /*
> > > +        * If uval has changed, let user space handle it.
> > > +        */
> > > +       ret = (curval == uval) ? 0 : -EAGAIN;
> > > +
> > >  out_unlock:
> > >         spin_unlock(&hb->lock);
> > >         put_futex_key(&key);
> > > -
> > > -out:
> > >         return ret;
> > >  
> > 
> > By dropping this you won't return ret, but rather fall through into
> > pi_faulted... which certainly isn't what you wanted.
> 
> By dropping the now unused "out" label I'm not longer returning ret?
>  

Sigh. Nevermind.

> The resulting code is:
> 
> out_unlock:
> 	spin_unlock(&hb->lock);
> 	put_futex_key(&key);
> 	return ret;
> 
> If you can explain me how that "return ret" falls through to
> pi_faulted magically, then I'm definitely agreeing with you on this:
> 
> > The need for better test coverage is very evident now :-)
> 
>   -ENOTENOUGHSLEEP or -ENOTENOUGHCOFFEE or -ENOGLASSES perhaps?
> 
> I'm omitting some other politicially incorrect speculations for now.

Guilty as charged, on all counts I'm afraid.

So, my only reservation is the possible splitting out of the refactoring
bit. But that's minor. Either way:

Reviewed-by: Darren Hart <darren@dvhart.com>


-- 
Darren Hart
Intel Open Source Technology Center



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch 1/5] futex: Make unlock_pi more robust
  2014-06-16 22:28       ` Thomas Gleixner
@ 2014-06-16 22:49         ` Darren Hart
  0 siblings, 0 replies; 26+ messages in thread
From: Darren Hart @ 2014-06-16 22:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Tue, 2014-06-17 at 00:28 +0200, Thomas Gleixner wrote:
> On Tue, 17 Jun 2014, Thomas Gleixner wrote:
> > On Mon, 16 Jun 2014, Darren Hart wrote:
> > > On Wed, 2014-06-11 at 20:45 +0000, Thomas Gleixner wrote:
> > > In wake_futex_pi we verify ownership by matching pi_state->owner ==
> > > current, but here the only test is the TID value, which is set by
> > > userspace - which we don't trust...
> > > 
> > > I'm trying to determine if it matters in this case... if there are no
> > > waiters, is the pi_state still around? If so, it does indeed matter, and
> > > we should be verifying.
> > 
> > Erm. The whole point of this patch is to do:
> > 
> >      - Find existing state first and handle it.
> > 
> >      - If no state exists and TID == current, take it
> > 
> >      - Otherwise create state
> 
> Duh, that was the lock path. But here the point is:
> 
>       - Find existing state first and handle it.
> 
>       - If no state exists and TID == current, release it
> 

Right, I understood your meaning, and I withdraw the concern.

> The retry is obvious, right?

Yes.

-- 
Darren Hart
Intel Open Source Technology Center



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [patch V2 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust
  2014-06-16 20:36           ` Darren Hart
@ 2014-06-17  7:20             ` Thomas Gleixner
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2014-06-17  7:20 UTC (permalink / raw)
  To: Darren Hart
  Cc: LKML, Peter Zijlstra, Ingo Molnar, Davidlohr Bueso, Kees Cook, wad

On Mon, 16 Jun 2014, Darren Hart wrote:
> On Fri, 2014-06-13 at 11:44 +0200, Thomas Gleixner wrote:
> Two general concerns, we appear to be eliminating both the force_take
> and the retry.
> 
> The force_take only occurs if TID==0, and that is covered here in a
> cleaner way, so I believe we are good here.
> 
> As for the retry, the remaining use case (outside of TID==0 ->
> force_take=1 -> retry) appears to be that userspace changed the value
> while we were running. Reading the value early doesn't protect us from
> this scenario. How does this change account for that?
>       
> It looks to me that before we would retry, while here we just give up
> and return -EAGAIN..., which is addressed in futex_lock_pi(), but not in
> the futex_requeue() callsite for futex_proxy_trylock_atomic. It does
> handle it, but I guess also needs a comment update to "The owner was
> exiting" to include "or userspace changed the value" as you did for
> futex_lock_pi().

Right. I moved the handling back to the call site which has to handle
the various error return codes anyway. 
 
> >From my analysis, this is a good cleanup and makes the code for
> explicit. I'm nervous about missing corner cases, and would like to
> understand what level of testing this has received. We need to add PI
> locking tests to futextest. There are some in glibc. Which tests were
> run to validate PI locking?

I ran it against everything I have. libc tests, your stuff, rt-tests
and random exploit and corner case exposure code I collected/wrote in
the last few weeks.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [tip:locking/core] futex: Make unlock_pi more robust
  2014-06-11 20:45 ` [patch 1/5] futex: Make unlock_pi more robust Thomas Gleixner
  2014-06-16 16:18   ` Darren Hart
@ 2014-06-21 20:33   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 26+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-06-21 20:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, darren, peterz, kees, tglx, davidlohr

Commit-ID:  ccf9e6a80d9e1b9df69c98e6b9745cf49869ee15
Gitweb:     http://git.kernel.org/tip/ccf9e6a80d9e1b9df69c98e6b9745cf49869ee15
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 11 Jun 2014 20:45:38 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 21 Jun 2014 22:26:23 +0200

futex: Make unlock_pi more robust

The kernel tries to atomically unlock the futex without checking
whether there is kernel state associated to the futex.

So if user space manipulated the user space value, this will leave
kernel internal state around associated to the owner task. 

For robustness sake, lookup first whether there are waiters on the
futex. If there are waiters, wake the top priority waiter with all the
proper sanity checks applied.

If there are no waiters, do the atomic release. We do not have to
preserve the waiters bit in this case, because a potentially incoming
waiter is blocked on the hb->lock and will acquire the futex
atomically. We neither have to preserve the owner died bit. The caller
is the owner and it was supposed to cleanup the mess.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.016987332@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c | 76 +++++++++++++++++++---------------------------------------
 1 file changed, 25 insertions(+), 51 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e5c6c40..346d5c2 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1186,22 +1186,6 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
 	return 0;
 }
 
-static int unlock_futex_pi(u32 __user *uaddr, u32 uval)
-{
-	u32 uninitialized_var(oldval);
-
-	/*
-	 * There is no waiter, so we unlock the futex. The owner died
-	 * bit has not to be preserved here. We are the owner:
-	 */
-	if (cmpxchg_futex_value_locked(&oldval, uaddr, uval, 0))
-		return -EFAULT;
-	if (oldval != uval)
-		return -EAGAIN;
-
-	return 0;
-}
-
 /*
  * Express the locking dependencies for lockdep:
  */
@@ -2401,10 +2385,10 @@ uaddr_faulted:
  */
 static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags)
 {
-	struct futex_hash_bucket *hb;
-	struct futex_q *this, *next;
+	u32 uninitialized_var(curval), uval, vpid = task_pid_vnr(current);
 	union futex_key key = FUTEX_KEY_INIT;
-	u32 uval, vpid = task_pid_vnr(current);
+	struct futex_hash_bucket *hb;
+	struct futex_q *match;
 	int ret;
 
 retry:
@@ -2417,57 +2401,47 @@ retry:
 		return -EPERM;
 
 	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key, VERIFY_WRITE);
-	if (unlikely(ret != 0))
-		goto out;
+	if (ret)
+		return ret;
 
 	hb = hash_futex(&key);
 	spin_lock(&hb->lock);
 
 	/*
-	 * To avoid races, try to do the TID -> 0 atomic transition
-	 * again. If it succeeds then we can return without waking
-	 * anyone else up. We only try this if neither the waiters nor
-	 * the owner died bit are set.
+	 * Check waiters first. We do not trust user space values at
+	 * all and we at least want to know if user space fiddled
+	 * with the futex value instead of blindly unlocking.
 	 */
-	if (!(uval & ~FUTEX_TID_MASK) &&
-	    cmpxchg_futex_value_locked(&uval, uaddr, vpid, 0))
-		goto pi_faulted;
-	/*
-	 * Rare case: we managed to release the lock atomically,
-	 * no need to wake anyone else up:
-	 */
-	if (unlikely(uval == vpid))
-		goto out_unlock;
-
-	/*
-	 * Ok, other tasks may need to be woken up - check waiters
-	 * and do the wakeup if necessary:
-	 */
-	plist_for_each_entry_safe(this, next, &hb->chain, list) {
-		if (!match_futex (&this->key, &key))
-			continue;
-		ret = wake_futex_pi(uaddr, uval, this);
+	match = futex_top_waiter(hb, &key);
+	if (match) {
+		ret = wake_futex_pi(uaddr, uval, match);
 		/*
-		 * The atomic access to the futex value
-		 * generated a pagefault, so retry the
-		 * user-access and the wakeup:
+		 * The atomic access to the futex value generated a
+		 * pagefault, so retry the user-access and the wakeup:
 		 */
 		if (ret == -EFAULT)
 			goto pi_faulted;
 		goto out_unlock;
 	}
+
 	/*
-	 * No waiters - kernel unlocks the futex:
+	 * We have no kernel internal state, i.e. no waiters in the
+	 * kernel. Waiters which are about to queue themselves are stuck
+	 * on hb->lock. So we can safely ignore them. We do neither
+	 * preserve the WAITERS bit not the OWNER_DIED one. We are the
+	 * owner.
 	 */
-	ret = unlock_futex_pi(uaddr, uval);
-	if (ret == -EFAULT)
+	if (cmpxchg_futex_value_locked(&curval, uaddr, uval, 0))
 		goto pi_faulted;
 
+	/*
+	 * If uval has changed, let user space handle it.
+	 */
+	ret = (curval == uval) ? 0 : -EAGAIN;
+
 out_unlock:
 	spin_unlock(&hb->lock);
 	put_futex_key(&key);
-
-out:
 	return ret;
 
 pi_faulted:

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [tip:locking/core] futex: Use futex_top_waiter() in lookup_pi_state()
  2014-06-11 20:45 ` [patch 2/5] futex: Use futex_top_waiter() in lookup_pi_state() Thomas Gleixner
  2014-06-16 16:51   ` Darren Hart
@ 2014-06-21 20:33   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 26+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-06-21 20:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, darren, kees, tglx, davidlohr

Commit-ID:  bd1dbcc67cd2c1181e2c01daac51eabf1b964dd8
Gitweb:     http://git.kernel.org/tip/bd1dbcc67cd2c1181e2c01daac51eabf1b964dd8
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 11 Jun 2014 20:45:39 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 21 Jun 2014 22:26:23 +0200

futex: Use futex_top_waiter() in lookup_pi_state()

No point in open coding the same function again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.092947239@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c | 124 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 61 insertions(+), 63 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 346d5c2..fff1ed9 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -796,87 +796,85 @@ static int
 lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 		union futex_key *key, struct futex_pi_state **ps)
 {
+	struct futex_q *match = futex_top_waiter(hb, key);
 	struct futex_pi_state *pi_state = NULL;
-	struct futex_q *this, *next;
 	struct task_struct *p;
 	pid_t pid = uval & FUTEX_TID_MASK;
 
-	plist_for_each_entry_safe(this, next, &hb->chain, list) {
-		if (match_futex(&this->key, key)) {
-			/*
-			 * Sanity check the waiter before increasing
-			 * the refcount and attaching to it.
-			 */
-			pi_state = this->pi_state;
-			/*
-			 * Userspace might have messed up non-PI and
-			 * PI futexes [3]
-			 */
-			if (unlikely(!pi_state))
-				return -EINVAL;
+	if (match) {
+		/*
+		 * Sanity check the waiter before increasing the
+		 * refcount and attaching to it.
+		 */
+		pi_state = match->pi_state;
+		/*
+		 * Userspace might have messed up non-PI and PI
+		 * futexes [3]
+		 */
+		if (unlikely(!pi_state))
+			return -EINVAL;
 
-			WARN_ON(!atomic_read(&pi_state->refcount));
+		WARN_ON(!atomic_read(&pi_state->refcount));
 
+		/*
+		 * Handle the owner died case:
+		 */
+		if (uval & FUTEX_OWNER_DIED) {
 			/*
-			 * Handle the owner died case:
+			 * exit_pi_state_list sets owner to NULL and
+			 * wakes the topmost waiter. The task which
+			 * acquires the pi_state->rt_mutex will fixup
+			 * owner.
 			 */
-			if (uval & FUTEX_OWNER_DIED) {
+			if (!pi_state->owner) {
 				/*
-				 * exit_pi_state_list sets owner to NULL and
-				 * wakes the topmost waiter. The task which
-				 * acquires the pi_state->rt_mutex will fixup
-				 * owner.
+				 * No pi state owner, but the user
+				 * space TID is not 0. Inconsistent
+				 * state. [5]
 				 */
-				if (!pi_state->owner) {
-					/*
-					 * No pi state owner, but the user
-					 * space TID is not 0. Inconsistent
-					 * state. [5]
-					 */
-					if (pid)
-						return -EINVAL;
-					/*
-					 * Take a ref on the state and
-					 * return. [4]
-					 */
-					goto out_state;
-				}
-
-				/*
-				 * If TID is 0, then either the dying owner
-				 * has not yet executed exit_pi_state_list()
-				 * or some waiter acquired the rtmutex in the
-				 * pi state, but did not yet fixup the TID in
-				 * user space.
-				 *
-				 * Take a ref on the state and return. [6]
-				 */
-				if (!pid)
-					goto out_state;
-			} else {
+				if (pid)
+					return -EINVAL;
 				/*
-				 * If the owner died bit is not set,
-				 * then the pi_state must have an
-				 * owner. [7]
+				 * Take a ref on the state and
+				 * return. [4]
 				 */
-				if (!pi_state->owner)
-					return -EINVAL;
+				goto out_state;
 			}
 
 			/*
-			 * Bail out if user space manipulated the
-			 * futex value. If pi state exists then the
-			 * owner TID must be the same as the user
-			 * space TID. [9/10]
+			 * If TID is 0, then either the dying owner
+			 * has not yet executed exit_pi_state_list()
+			 * or some waiter acquired the rtmutex in the
+			 * pi state, but did not yet fixup the TID in
+			 * user space.
+			 *
+			 * Take a ref on the state and return. [6]
+			 */
+			if (!pid)
+				goto out_state;
+		} else {
+			/*
+			 * If the owner died bit is not set,
+			 * then the pi_state must have an
+			 * owner. [7]
 			 */
-			if (pid != task_pid_vnr(pi_state->owner))
+			if (!pi_state->owner)
 				return -EINVAL;
-
-		out_state:
-			atomic_inc(&pi_state->refcount);
-			*ps = pi_state;
-			return 0;
 		}
+
+		/*
+		 * Bail out if user space manipulated the
+		 * futex value. If pi state exists then the
+		 * owner TID must be the same as the user
+		 * space TID. [9/10]
+		 */
+		if (pid != task_pid_vnr(pi_state->owner))
+			return -EINVAL;
+
+	out_state:
+		atomic_inc(&pi_state->refcount);
+		*ps = pi_state;
+		return 0;
 	}
 
 	/*

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [tip:locking/core] futex: Split out the waiter check from lookup_pi_state()
  2014-06-11 20:45 ` [patch 3/5] futex: Split out the waiter check from lookup_pi_state() Thomas Gleixner
  2014-06-16 18:12   ` Darren Hart
@ 2014-06-21 20:33   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 26+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-06-21 20:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, darren, kees, tglx, davidlohr

Commit-ID:  e60cbc5ceaa518d630ab8f35a7d05cee1c752648
Gitweb:     http://git.kernel.org/tip/e60cbc5ceaa518d630ab8f35a7d05cee1c752648
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 11 Jun 2014 20:45:39 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 21 Jun 2014 22:26:23 +0200

futex: Split out the waiter check from lookup_pi_state()

We want to be a bit more clever in futex_lock_pi_atomic() and separate
the possible states. Split out the waiter verification into a separate
function. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.180458410@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c | 138 +++++++++++++++++++++++++++++----------------------------
 1 file changed, 71 insertions(+), 67 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index fff1ed9..db0c686 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -792,92 +792,96 @@ void exit_pi_state_list(struct task_struct *curr)
  * [10] There is no transient state which leaves owner and user space
  *	TID out of sync.
  */
-static int
-lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
-		union futex_key *key, struct futex_pi_state **ps)
+
+/*
+ * Validate that the existing waiter has a pi_state and sanity check
+ * the pi_state against the user space value. If correct, attach to
+ * it.
+ */
+static int attach_to_pi_state(u32 uval, struct futex_pi_state *pi_state,
+			      struct futex_pi_state **ps)
 {
-	struct futex_q *match = futex_top_waiter(hb, key);
-	struct futex_pi_state *pi_state = NULL;
-	struct task_struct *p;
 	pid_t pid = uval & FUTEX_TID_MASK;
 
-	if (match) {
-		/*
-		 * Sanity check the waiter before increasing the
-		 * refcount and attaching to it.
-		 */
-		pi_state = match->pi_state;
-		/*
-		 * Userspace might have messed up non-PI and PI
-		 * futexes [3]
-		 */
-		if (unlikely(!pi_state))
-			return -EINVAL;
+	/*
+	 * Userspace might have messed up non-PI and PI futexes [3]
+	 */
+	if (unlikely(!pi_state))
+		return -EINVAL;
 
-		WARN_ON(!atomic_read(&pi_state->refcount));
+	WARN_ON(!atomic_read(&pi_state->refcount));
 
+	/*
+	 * Handle the owner died case:
+	 */
+	if (uval & FUTEX_OWNER_DIED) {
 		/*
-		 * Handle the owner died case:
+		 * exit_pi_state_list sets owner to NULL and wakes the
+		 * topmost waiter. The task which acquires the
+		 * pi_state->rt_mutex will fixup owner.
 		 */
-		if (uval & FUTEX_OWNER_DIED) {
+		if (!pi_state->owner) {
 			/*
-			 * exit_pi_state_list sets owner to NULL and
-			 * wakes the topmost waiter. The task which
-			 * acquires the pi_state->rt_mutex will fixup
-			 * owner.
+			 * No pi state owner, but the user space TID
+			 * is not 0. Inconsistent state. [5]
 			 */
-			if (!pi_state->owner) {
-				/*
-				 * No pi state owner, but the user
-				 * space TID is not 0. Inconsistent
-				 * state. [5]
-				 */
-				if (pid)
-					return -EINVAL;
-				/*
-				 * Take a ref on the state and
-				 * return. [4]
-				 */
-				goto out_state;
-			}
-
-			/*
-			 * If TID is 0, then either the dying owner
-			 * has not yet executed exit_pi_state_list()
-			 * or some waiter acquired the rtmutex in the
-			 * pi state, but did not yet fixup the TID in
-			 * user space.
-			 *
-			 * Take a ref on the state and return. [6]
-			 */
-			if (!pid)
-				goto out_state;
-		} else {
+			if (pid)
+				return -EINVAL;
 			/*
-			 * If the owner died bit is not set,
-			 * then the pi_state must have an
-			 * owner. [7]
+			 * Take a ref on the state and return success. [4]
 			 */
-			if (!pi_state->owner)
-				return -EINVAL;
+			goto out_state;
 		}
 
 		/*
-		 * Bail out if user space manipulated the
-		 * futex value. If pi state exists then the
-		 * owner TID must be the same as the user
-		 * space TID. [9/10]
+		 * If TID is 0, then either the dying owner has not
+		 * yet executed exit_pi_state_list() or some waiter
+		 * acquired the rtmutex in the pi state, but did not
+		 * yet fixup the TID in user space.
+		 *
+		 * Take a ref on the state and return success. [6]
 		 */
-		if (pid != task_pid_vnr(pi_state->owner))
+		if (!pid)
+			goto out_state;
+	} else {
+		/*
+		 * If the owner died bit is not set, then the pi_state
+		 * must have an owner. [7]
+		 */
+		if (!pi_state->owner)
 			return -EINVAL;
-
-	out_state:
-		atomic_inc(&pi_state->refcount);
-		*ps = pi_state;
-		return 0;
 	}
 
 	/*
+	 * Bail out if user space manipulated the futex value. If pi
+	 * state exists then the owner TID must be the same as the
+	 * user space TID. [9/10]
+	 */
+	if (pid != task_pid_vnr(pi_state->owner))
+		return -EINVAL;
+out_state:
+	atomic_inc(&pi_state->refcount);
+	*ps = pi_state;
+	return 0;
+}
+
+static int
+lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
+		union futex_key *key, struct futex_pi_state **ps)
+{
+	struct futex_q *match = futex_top_waiter(hb, key);
+	struct futex_pi_state *pi_state = NULL;
+	struct task_struct *p;
+	pid_t pid = uval & FUTEX_TID_MASK;
+
+	/*
+	 * If there is a waiter on that futex, validate it and
+	 * attach to the pi_state when the validation succeeds.
+	 */
+	if (match)
+		return attach_to_pi_state(uval, match->pi_state, ps);
+
+	/*
 	 * We are the first waiter - try to look up the real owner and attach
 	 * the new pi_state to it, but bail out when TID = 0 [1]
 	 */

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [tip:locking/core] futex: Split out the first waiter attachment from lookup_pi_state()
  2014-06-11 20:45 ` [patch 4/5] futex: Split out the first waiter attachment from lookup_pi_state() Thomas Gleixner
  2014-06-16 18:19   ` Darren Hart
@ 2014-06-21 20:33   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 26+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-06-21 20:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, darren, kees, tglx, davidlohr

Commit-ID:  04e1b2e52b17195c9a1daa5935c55a4c8716095c
Gitweb:     http://git.kernel.org/tip/04e1b2e52b17195c9a1daa5935c55a4c8716095c
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 11 Jun 2014 20:45:40 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 21 Jun 2014 22:26:23 +0200

futex: Split out the first waiter attachment from lookup_pi_state()

We want to be a bit more clever in futex_lock_pi_atomic() and separate
the possible states. Split out the code which attaches the first
waiter to the owner into a separate function. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.271300614@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c | 42 ++++++++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index db0c686..e65b686 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -865,21 +865,16 @@ out_state:
 	return 0;
 }
 
-static int
-lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
-		union futex_key *key, struct futex_pi_state **ps)
+/*
+ * Lookup the task for the TID provided from user space and attach to
+ * it after doing proper sanity checks.
+ */
+static int attach_to_pi_owner(u32 uval, union futex_key *key,
+			      struct futex_pi_state **ps)
 {
-	struct futex_q *match = futex_top_waiter(hb, key);
-	struct futex_pi_state *pi_state = NULL;
-	struct task_struct *p;
 	pid_t pid = uval & FUTEX_TID_MASK;
-
-	/*
-	 * If there is a waiter on that futex, validate it and
-	 * attach to the pi_state when the validation succeeds.
-	 */
-	if (match)
-		return attach_to_pi_state(uval, match->pi_state, ps);
+	struct futex_pi_state *pi_state;
+	struct task_struct *p;
 
 	/*
 	 * We are the first waiter - try to look up the real owner and attach
@@ -922,7 +917,7 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 	pi_state = alloc_pi_state();
 
 	/*
-	 * Initialize the pi_mutex in locked state and make 'p'
+	 * Initialize the pi_mutex in locked state and make @p
 	 * the owner of it:
 	 */
 	rt_mutex_init_proxy_locked(&pi_state->pi_mutex, p);
@@ -942,6 +937,25 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 	return 0;
 }
 
+static int lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
+			   union futex_key *key, struct futex_pi_state **ps)
+{
+	struct futex_q *match = futex_top_waiter(hb, key);
+
+	/*
+	 * If there is a waiter on that futex, validate it and
+	 * attach to the pi_state when the validation succeeds.
+	 */
+	if (match)
+		return attach_to_pi_state(uval, match->pi_state, ps);
+
+	/*
+	 * We are the first waiter - try to look up the owner based on
+	 * @uval and attach to it.
+	 */
+	return attach_to_pi_owner(uval, key, ps);
+}
+
 /**
  * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
  * @uaddr:		the pi futex user address

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [tip:locking/core] futex: Simplify futex_lock_pi_atomic() and make it more robust
  2014-06-13  9:44         ` [patch V2 " Thomas Gleixner
  2014-06-13 20:51           ` Davidlohr Bueso
  2014-06-16 20:36           ` Darren Hart
@ 2014-06-21 20:34           ` tip-bot for Thomas Gleixner
  2 siblings, 0 replies; 26+ messages in thread
From: tip-bot for Thomas Gleixner @ 2014-06-21 20:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, darren, peterz, kees, tglx, davidlohr

Commit-ID:  af54d6a1c3ad474bbc9893c9905022646be6092c
Gitweb:     http://git.kernel.org/tip/af54d6a1c3ad474bbc9893c9905022646be6092c
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 11 Jun 2014 20:45:41 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 21 Jun 2014 22:26:24 +0200

futex: Simplify futex_lock_pi_atomic() and make it more robust

futex_lock_pi_atomic() is a maze of retry hoops and loops.

Reduce it to simple and understandable states:

First step is to lookup existing waiters (state) in the kernel.

If there is an existing waiter, validate it and attach to it.

If there is no existing waiter, check the user space value

If the TID encoded in the user space value is 0, take over the futex
preserving the owner died bit.

If the TID encoded in the user space value is != 0, lookup the owner
task, validate it and attach to it.

Reduces text size by 128 bytes on x8664.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Cc: Darren Hart <darren@dvhart.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1406131137020.5170@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/futex.c | 148 ++++++++++++++++++++++++---------------------------------
 1 file changed, 61 insertions(+), 87 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e65b686..d3a9d94 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -956,6 +956,17 @@ static int lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 	return attach_to_pi_owner(uval, key, ps);
 }
 
+static int lock_pi_update_atomic(u32 __user *uaddr, u32 uval, u32 newval)
+{
+	u32 uninitialized_var(curval);
+
+	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
+		return -EFAULT;
+
+	/*If user space value changed, let the caller retry */
+	return curval != uval ? -EAGAIN : 0;
+}
+
 /**
  * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
  * @uaddr:		the pi futex user address
@@ -979,113 +990,69 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
 				struct futex_pi_state **ps,
 				struct task_struct *task, int set_waiters)
 {
-	int lock_taken, ret, force_take = 0;
-	u32 uval, newval, curval, vpid = task_pid_vnr(task);
-
-retry:
-	ret = lock_taken = 0;
+	u32 uval, newval, vpid = task_pid_vnr(task);
+	struct futex_q *match;
+	int ret;
 
 	/*
-	 * To avoid races, we attempt to take the lock here again
-	 * (by doing a 0 -> TID atomic cmpxchg), while holding all
-	 * the locks. It will most likely not succeed.
+	 * Read the user space value first so we can validate a few
+	 * things before proceeding further.
 	 */
-	newval = vpid;
-	if (set_waiters)
-		newval |= FUTEX_WAITERS;
-
-	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, 0, newval)))
+	if (get_futex_value_locked(&uval, uaddr))
 		return -EFAULT;
 
 	/*
 	 * Detect deadlocks.
 	 */
-	if ((unlikely((curval & FUTEX_TID_MASK) == vpid)))
+	if ((unlikely((uval & FUTEX_TID_MASK) == vpid)))
 		return -EDEADLK;
 
 	/*
-	 * Surprise - we got the lock, but we do not trust user space at all.
+	 * Lookup existing state first. If it exists, try to attach to
+	 * its pi_state.
 	 */
-	if (unlikely(!curval)) {
-		/*
-		 * We verify whether there is kernel state for this
-		 * futex. If not, we can safely assume, that the 0 ->
-		 * TID transition is correct. If state exists, we do
-		 * not bother to fixup the user space state as it was
-		 * corrupted already.
-		 */
-		return futex_top_waiter(hb, key) ? -EINVAL : 1;
-	}
-
-	uval = curval;
-
-	/*
-	 * Set the FUTEX_WAITERS flag, so the owner will know it has someone
-	 * to wake at the next unlock.
-	 */
-	newval = curval | FUTEX_WAITERS;
+	match = futex_top_waiter(hb, key);
+	if (match)
+		return attach_to_pi_state(uval, match->pi_state, ps);
 
 	/*
-	 * Should we force take the futex? See below.
+	 * No waiter and user TID is 0. We are here because the
+	 * waiters or the owner died bit is set or called from
+	 * requeue_cmp_pi or for whatever reason something took the
+	 * syscall.
 	 */
-	if (unlikely(force_take)) {
+	if (!(uval & FUTEX_TID_MASK)) {
 		/*
-		 * Keep the OWNER_DIED and the WAITERS bit and set the
-		 * new TID value.
+		 * We take over the futex. No other waiters and the user space
+		 * TID is 0. We preserve the owner died bit.
 		 */
-		newval = (curval & ~FUTEX_TID_MASK) | vpid;
-		force_take = 0;
-		lock_taken = 1;
-	}
+		newval = uval & FUTEX_OWNER_DIED;
+		newval |= vpid;
 
-	if (unlikely(cmpxchg_futex_value_locked(&curval, uaddr, uval, newval)))
-		return -EFAULT;
-	if (unlikely(curval != uval))
-		goto retry;
+		/* The futex requeue_pi code can enforce the waiters bit */
+		if (set_waiters)
+			newval |= FUTEX_WAITERS;
+
+		ret = lock_pi_update_atomic(uaddr, uval, newval);
+		/* If the take over worked, return 1 */
+		return ret < 0 ? ret : 1;
+	}
 
 	/*
-	 * We took the lock due to forced take over.
+	 * First waiter. Set the waiters bit before attaching ourself to
+	 * the owner. If owner tries to unlock, it will be forced into
+	 * the kernel and blocked on hb->lock.
 	 */
-	if (unlikely(lock_taken))
-		return 1;
-
+	newval = uval | FUTEX_WAITERS;
+	ret = lock_pi_update_atomic(uaddr, uval, newval);
+	if (ret)
+		return ret;
 	/*
-	 * We dont have the lock. Look up the PI state (or create it if
-	 * we are the first waiter):
+	 * If the update of the user space value succeeded, we try to
+	 * attach to the owner. If that fails, no harm done, we only
+	 * set the FUTEX_WAITERS bit in the user space variable.
 	 */
-	ret = lookup_pi_state(uval, hb, key, ps);
-
-	if (unlikely(ret)) {
-		switch (ret) {
-		case -ESRCH:
-			/*
-			 * We failed to find an owner for this
-			 * futex. So we have no pi_state to block
-			 * on. This can happen in two cases:
-			 *
-			 * 1) The owner died
-			 * 2) A stale FUTEX_WAITERS bit
-			 *
-			 * Re-read the futex value.
-			 */
-			if (get_futex_value_locked(&curval, uaddr))
-				return -EFAULT;
-
-			/*
-			 * If the owner died or we have a stale
-			 * WAITERS bit the owner TID in the user space
-			 * futex is 0.
-			 */
-			if (!(curval & FUTEX_TID_MASK)) {
-				force_take = 1;
-				goto retry;
-			}
-		default:
-			break;
-		}
-	}
-
-	return ret;
+	return attach_to_pi_owner(uval, key, ps);
 }
 
 /**
@@ -1659,7 +1626,12 @@ retry_private:
 				goto retry;
 			goto out;
 		case -EAGAIN:
-			/* The owner was exiting, try again. */
+			/*
+			 * Two reasons for this:
+			 * - Owner is exiting and we just wait for the
+			 *   exit to complete.
+			 * - The user space value changed.
+			 */
 			double_unlock_hb(hb1, hb2);
 			hb_waiters_dec(hb2);
 			put_futex_key(&key2);
@@ -2316,8 +2288,10 @@ retry_private:
 			goto uaddr_faulted;
 		case -EAGAIN:
 			/*
-			 * Task is exiting and we just wait for the
-			 * exit to complete.
+			 * Two reasons for this:
+			 * - Task is exiting and we just wait for the
+			 *   exit to complete.
+			 * - The user space value changed.
 			 */
 			queue_unlock(hb);
 			put_futex_key(&q.key);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-06-21 20:35 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-11 20:45 [patch 0/5] futex: More robustness tweaks Thomas Gleixner
2014-06-11 20:45 ` [patch 1/5] futex: Make unlock_pi more robust Thomas Gleixner
2014-06-16 16:18   ` Darren Hart
2014-06-16 22:15     ` Thomas Gleixner
2014-06-16 22:28       ` Thomas Gleixner
2014-06-16 22:49         ` Darren Hart
2014-06-16 22:39       ` Darren Hart
2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
2014-06-11 20:45 ` [patch 3/5] futex: Split out the waiter check from lookup_pi_state() Thomas Gleixner
2014-06-16 18:12   ` Darren Hart
2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
2014-06-11 20:45 ` [patch 2/5] futex: Use futex_top_waiter() in lookup_pi_state() Thomas Gleixner
2014-06-16 16:51   ` Darren Hart
2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
2014-06-11 20:45 ` [patch 4/5] futex: Split out the first waiter attachment from lookup_pi_state() Thomas Gleixner
2014-06-16 18:19   ` Darren Hart
2014-06-21 20:33   ` [tip:locking/core] " tip-bot for Thomas Gleixner
2014-06-11 20:45 ` [patch 5/5] futex: Simplify futex_lock_pi_atomic() and make it more robust Thomas Gleixner
2014-06-13  5:46   ` Darren Hart
2014-06-13  8:34     ` Thomas Gleixner
2014-06-13  9:36       ` Thomas Gleixner
2014-06-13  9:44         ` [patch V2 " Thomas Gleixner
2014-06-13 20:51           ` Davidlohr Bueso
2014-06-16 20:36           ` Darren Hart
2014-06-17  7:20             ` Thomas Gleixner
2014-06-21 20:34           ` [tip:locking/core] " tip-bot for Thomas Gleixner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.