All of lore.kernel.org
 help / color / mirror / Atom feed
* [tip PATCH v6 0/8] requeue pi implementation
@ 2009-03-30 21:37 Darren Hart
  2009-03-30 21:37 ` [tip PATCH v6 1/8] RFC: futex: futex_wait_queue_me() Darren Hart
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-30 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

The following series is v6 of the requeue_pi patches against
linux-2.6-tip/core/futexes + core/urgent + master.  The current futex
implementation doesn't allow for requeueing of PI futexes, which leads to a
thundering herd during pthread_cond_broadcasa()t (as opposed to a civilized
priority ordered wakeup sequence).  The core of the problem is that the
underlying rt_mutex cannot be left with waiters and no owner (which would break
the PI logic).  This patch series updates the futex code to allow for
requeueing from non-PI to PI futexes in support of PI aware pthread_cond_*
calls along with some needful rt_mutex helper routines.  The credit for the
conceptual design goes to Thomas Gleixner, while the bugs and other idiocies
present in this implementation should be attributed to me.

Since the last version I have reworked the requeue logic, renamed
futex_requeue_pi_init() to futex_proxy_trylock_atomic(), fixed several bugs in
timeout and signal handling in futex_wait_requeue_pi(), incorporated some
feedback from Thomas and others, and corrected some inconsistencies in my
comments and comment formats.

This version has been tested with a rough raw futex syscall test case as well
as with a preliminary glibc patch that updates the pthread_cond* calls to use
the new syscalls and allow for the PI calls to take ownership of the rt_mutex
inside the kernel (see the "glibc hacks for requeue_pi" at the end of this
series).  With this patched glibc the LTP realtime/func/prio-wake test case has
passed more than 6k consecutive iterations (whereas before it would fail 10% of
the time).  prio-wake tests the priority ordered wakeup of a
pthread_cond_broadcast() using a PI mutex.  I have exercised the timeout and
signal paths of futex_wait_requeue_pi() prior to requeue.  I am working to add
more sophisticated tests to be able to exercise the post-requeue paths as well.
Additionally, I'd like to add some fault-injection.

I'd really appreciate feedback on the implementation as well as any design
critique.  Answers to the questions posed in the patch headers and code
comments are particularly welcome.  If we agree on the general approach, I'd
like to refactor futex_wait_requeue_pi() as it is rather lengthy at this point.

---

Darren Hart (8):
      RFC: futex: add requeue_pi calls
      RFC: futex: Add requeue_futex() call
      RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags
      RFC: rt_mutex: add proxy lock routines
      RFC: futex: finish_futex_lock_pi()
      RFC: futex: futex_lock_pi_atomic()
      RFC: futex: futex_top_waiter()
      RFC: futex: futex_wait_queue_me()


 include/linux/futex.h       |    8 
 include/linux/thread_info.h |    3 
 kernel/futex.c              | 1116 +++++++++++++++++++++++++++++++++----------
 kernel/rtmutex.c            |  240 +++++++--
 kernel/rtmutex_common.h     |    8 
 5 files changed, 1063 insertions(+), 312 deletions(-)

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip PATCH v6 1/8] RFC: futex: futex_wait_queue_me()
  2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
@ 2009-03-30 21:37 ` Darren Hart
  2009-03-31  6:44   ` Thomas Gleixner
  2009-03-30 21:37 ` [tip PATCH v6 2/8] RFC: futex: futex_top_waiter() Darren Hart
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Darren Hart @ 2009-03-30 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Darren Hart,
	Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

Refactor futex_wait() in preparation for futex_wait_requeue_pi().  In order to
reuse a good chunk of the futex_wait() code for the upcoming
futex_wait_requeue_pi() function, this patch breaks out the queue-to-wakeup
section of futex_wait() into futex_wait_queue_me().

Changelog:
V6: -Incremental build fixes
V4: -Nesting cleanups
    -Delayed hrtimer start until after setting TASK_INTERRUPTIBLE
V1: -Initial version

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jakub Jelinek <jakub@redhat.com>
---

 kernel/futex.c |  138 +++++++++++++++++++++++++++++++-------------------------
 1 files changed, 76 insertions(+), 62 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 6b50a02..b2ed762 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1115,24 +1115,87 @@ handle_fault:
 
 static long futex_wait_restart(struct restart_block *restart);
 
+/**
+ * futex_wait_queue_me() - queue_me and wait for wakeup, timeout, or signal.
+ * @hb:		the futex hash bucket, must be locked by the caller
+ * @q:		the futex_q to queue up on
+ * @timeout:	the prepared hrtimer_sleeper, or null for no timeout.
+ */
+static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
+				struct hrtimer_sleeper *timeout)
+{
+	DECLARE_WAITQUEUE(wait, current);
+
+	queue_me(q, hb);
+
+	/*
+	 * There might have been scheduling since the queue_me(), as we
+	 * cannot hold a spinlock across the get_user() in case it
+	 * faults, and we cannot just set TASK_INTERRUPTIBLE state when
+	 * queueing ourselves into the futex hash.  This code thus has to
+	 * rely on the futex_wake() code removing us from hash when it
+	 * wakes us up.
+	 */
+
+	/* add_wait_queue is the barrier after __set_current_state. */
+	__set_current_state(TASK_INTERRUPTIBLE);
+	add_wait_queue(&q->waiter, &wait);
+	/*
+	 * NOTE: we don't remove ourselves from the waitqueue because
+	 * we are the only user of it.
+	 */
+
+	/* Arm the timer */
+	if (timeout) {
+		hrtimer_start_expires(&timeout->timer, HRTIMER_MODE_ABS);
+		if (!hrtimer_active(&timeout->timer))
+			timeout->task = NULL;
+	}
+
+	/*
+	 * !plist_node_empty() is safe here without any lock.
+	 * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
+	 */
+	if (likely(!plist_node_empty(&q->list))) {
+		/*
+		 * If the timer has already expired, current will already be
+		 * flagged for rescheduling. Only call schedule if there
+		 * is no timeout, or if it has yet to expire.
+		 */
+		if (!timeout || likely(timeout->task))
+			schedule();
+	}
+	__set_current_state(TASK_RUNNING);
+}
+
 static int futex_wait(u32 __user *uaddr, int fshared,
 		      u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
 {
-	struct task_struct *curr = current;
+	struct hrtimer_sleeper timeout, *to = NULL;
 	struct restart_block *restart;
-	DECLARE_WAITQUEUE(wait, curr);
 	struct futex_hash_bucket *hb;
 	struct futex_q q;
 	u32 uval;
 	int ret;
-	struct hrtimer_sleeper t;
-	int rem = 0;
 
 	if (!bitset)
 		return -EINVAL;
 
 	q.pi_state = NULL;
 	q.bitset = bitset;
+
+	if (abs_time) {
+		unsigned long slack;
+		to = &timeout;
+		slack = current->timer_slack_ns;
+		if (rt_task(current))
+			slack = 0;
+		hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME :
+				      CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+		hrtimer_init_sleeper(to, current);
+		hrtimer_set_expires_range_ns(&to->timer, *abs_time, slack);
+	}
+
 retry:
 	q.key = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr, fshared, &q.key);
@@ -1178,75 +1241,22 @@ retry_private:
 		goto retry;
 	}
 	ret = -EWOULDBLOCK;
+
+	/* Only actually queue if *uaddr contained val.  */
 	if (unlikely(uval != val)) {
 		queue_unlock(&q, hb);
 		goto out_put_key;
 	}
 
-	/* Only actually queue if *uaddr contained val.  */
-	queue_me(&q, hb);
-
-	/*
-	 * There might have been scheduling since the queue_me(), as we
-	 * cannot hold a spinlock across the get_user() in case it
-	 * faults, and we cannot just set TASK_INTERRUPTIBLE state when
-	 * queueing ourselves into the futex hash.  This code thus has to
-	 * rely on the futex_wake() code removing us from hash when it
-	 * wakes us up.
-	 */
-
-	/* add_wait_queue is the barrier after __set_current_state. */
-	__set_current_state(TASK_INTERRUPTIBLE);
-	add_wait_queue(&q.waiter, &wait);
-	/*
-	 * !plist_node_empty() is safe here without any lock.
-	 * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
-	 */
-	if (likely(!plist_node_empty(&q.list))) {
-		if (!abs_time)
-			schedule();
-		else {
-			hrtimer_init_on_stack(&t.timer,
-					      clockrt ? CLOCK_REALTIME :
-					      CLOCK_MONOTONIC,
-					      HRTIMER_MODE_ABS);
-			hrtimer_init_sleeper(&t, current);
-			hrtimer_set_expires_range_ns(&t.timer, *abs_time,
-						     current->timer_slack_ns);
-
-			hrtimer_start_expires(&t.timer, HRTIMER_MODE_ABS);
-			if (!hrtimer_active(&t.timer))
-				t.task = NULL;
-
-			/*
-			 * the timer could have already expired, in which
-			 * case current would be flagged for rescheduling.
-			 * Don't bother calling schedule.
-			 */
-			if (likely(t.task))
-				schedule();
-
-			hrtimer_cancel(&t.timer);
-
-			/* Flag if a timeout occured */
-			rem = (t.task == NULL);
-
-			destroy_hrtimer_on_stack(&t.timer);
-		}
-	}
-	__set_current_state(TASK_RUNNING);
-
-	/*
-	 * NOTE: we don't remove ourselves from the waitqueue because
-	 * we are the only user of it.
-	 */
+	/* queue_me and wait for wakeup, timeout, or a signal. */
+	futex_wait_queue_me(hb, &q, to);
 
 	/* If we were woken (and unqueued), we succeeded, whatever. */
 	ret = 0;
 	if (!unqueue_me(&q))
 		goto out_put_key;
 	ret = -ETIMEDOUT;
-	if (rem)
+	if (to && !to->task)
 		goto out_put_key;
 
 	/*
@@ -1275,6 +1285,10 @@ retry_private:
 out_put_key:
 	put_futex_key(fshared, &q.key);
 out:
+	if (to) {
+		hrtimer_cancel(&to->timer);
+		destroy_hrtimer_on_stack(&to->timer);
+	}
 	return ret;
 }
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip PATCH v6 2/8] RFC: futex: futex_top_waiter()
  2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
  2009-03-30 21:37 ` [tip PATCH v6 1/8] RFC: futex: futex_wait_queue_me() Darren Hart
@ 2009-03-30 21:37 ` Darren Hart
  2009-03-30 21:37 ` [tip PATCH v6 3/8] RFC: futex: futex_lock_pi_atomic() Darren Hart
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-30 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Darren Hart,
	Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

Improve legibility by wrapping finding the top waiter in a function.  This
will be used by the follow-on patches for enabling requeue pi.

Changelog:
V6: -Shortened function length after review from tglx
    -Corrected kernel-doc formatting
V4: -Simplified logic in the hb loop
    -Return the futex_q as the task can be had from there if needed
    -Updated comments and dropped the -safe iterator per review from tglx
V3: -Comment linewidth cleanup
V1: -Initial version

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jakub Jelinek <jakub@redhat.com>
---

 kernel/futex.c |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index b2ed762..8ebed88 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -276,6 +276,25 @@ void put_futex_key(int fshared, union futex_key *key)
 	drop_futex_key_refs(key);
 }
 
+/**
+ * futex_top_waiter() - return the highest priority waiter on a futex
+ * @hb:     the hash bucket the futex_q's reside in
+ * @key:    the futex key (to distinguish it from other futex futex_q's)
+ *
+ * Must be called with the hb lock held.
+ */
+static struct futex_q *futex_top_waiter(struct futex_hash_bucket *hb,
+					union futex_key *key)
+{
+	struct futex_q *this;
+
+	plist_for_each_entry(this, &hb->chain, list) {
+		if (match_futex(&this->key, key))
+			return this;
+	}
+	return NULL;
+}
+
 static u32 cmpxchg_futex_value_locked(u32 __user *uaddr, u32 uval, u32 newval)
 {
 	u32 curval;


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip PATCH v6 3/8] RFC: futex: futex_lock_pi_atomic()
  2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
  2009-03-30 21:37 ` [tip PATCH v6 1/8] RFC: futex: futex_wait_queue_me() Darren Hart
  2009-03-30 21:37 ` [tip PATCH v6 2/8] RFC: futex: futex_top_waiter() Darren Hart
@ 2009-03-30 21:37 ` Darren Hart
  2009-03-31  6:49   ` Thomas Gleixner
  2009-03-30 21:38 ` [tip PATCH v6 4/8] RFC: futex: finish_futex_lock_pi() Darren Hart
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Darren Hart @ 2009-03-30 21:37 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Darren Hart,
	Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

Refactor the atomic portion of futex_lock_pi() into futex_lock_pi_atomic().
This logic will be needed elsewhere, so modularize it to reduce code
duplication.  The only significant change is passing of the task to try and
take the lock for.  This simplifies the -EDEADLK test as if the lock is owned
by task t, it's a deadlock, regardless of if we are doing requeue pi or not.
This patch updates the corresponding comment accordingly.

Changelog:
V6: -comment update from peterz
V4: -fixes for style errors caught by checkpatch.pl
    -use task instead of t for legibility whilst searching
    -pass key and pi_state directly rather than trying to rely on the futex_q
     since the key may be different.
    -minor cleanups suggested by tglx
V2: -Initial version

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jakub Jelinek <jakub@redhat.com>
---

 kernel/futex.c |  226 +++++++++++++++++++++++++++++++++-----------------------
 1 files changed, 132 insertions(+), 94 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 8ebed88..905b52a 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -556,6 +556,129 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 	return 0;
 }
 
+/**
+ * futex_lock_pi_atomic() - atomic work required to acquire a pi aware futex
+ * @uaddr:	the pi futex user address
+ * @hb:		the pi futex hash bucket
+ * @key:	the futex key associated with uaddr and hb
+ * @ps:		the pi_state pointer where we store the result of the lookup
+ * @task:	the task to perform the atomic lock work for.  This will be
+ * 		"current" except in the case of requeue pi.
+ *
+ * Returns:
+ *  0 - ready to wait
+ *  1 - acquired the lock
+ * <0 - error
+ *
+ * The hb->lock and futex_key refs shall be held by the caller.
+ */
+static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
+				union futex_key *key,
+				struct futex_pi_state **ps,
+				struct task_struct *task)
+{
+	u32 uval, newval, curval;
+	int lock_taken;
+	int ret;
+	int ownerdied = 0;
+
+retry:
+	ret = lock_taken = 0;
+
+	/*
+	 * To avoid races, we attempt to take the lock here again
+	 * (by doing a 0 -> TID atomic cmpxchg), while holding all
+	 * the locks. It will most likely not succeed.
+	 */
+	newval = task_pid_vnr(task);
+
+	curval = cmpxchg_futex_value_locked(uaddr, 0, newval);
+
+	if (unlikely(curval == -EFAULT))
+		return -EFAULT;
+
+	/*
+	 * Detect deadlocks.
+	 */
+	if ((unlikely((curval & FUTEX_TID_MASK) == task_pid_vnr(task))))
+		return -EDEADLK;
+
+	/*
+	 * Surprise - we got the lock. Just return to userspace:
+	 */
+	if (unlikely(!curval))
+		return 1;
+
+	uval = curval;
+
+	/*
+	 * Set the FUTEX_WAITERS flag, so the owner will know it has someone
+	 * to wake at the next unlock.
+	 */
+	newval = curval | FUTEX_WAITERS;
+
+	/*
+	 * There are two cases, where a futex might have no owner (the
+	 * owner TID is 0): OWNER_DIED. We take over the futex in this
+	 * case. We also do an unconditional take over, when the owner
+	 * of the futex died.
+	 *
+	 * This is safe as we are protected by the hash bucket lock !
+	 */
+	if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
+		/* Keep the OWNER_DIED bit */
+		newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task);
+		ownerdied = 0;
+		lock_taken = 1;
+	}
+
+	curval = cmpxchg_futex_value_locked(uaddr, uval, newval);
+
+	if (unlikely(curval == -EFAULT))
+		return -EFAULT;
+	if (unlikely(curval != uval))
+		goto retry;
+
+	/*
+	 * We took the lock due to owner died take over.
+	 */
+	if (unlikely(lock_taken))
+		return 1;
+
+	/*
+	 * We dont have the lock. Look up the PI state (or create it if
+	 * we are the first waiter):
+	 */
+	ret = lookup_pi_state(uval, hb, key, ps);
+
+	if (unlikely(ret)) {
+		switch (ret) {
+		case -ESRCH:
+			/*
+			 * No owner found for this futex. Check if the
+			 * OWNER_DIED bit is set to figure out whether
+			 * this is a robust futex or not.
+			 */
+			if (get_futex_value_locked(&curval, uaddr))
+				return -EFAULT;
+
+			/*
+			 * We simply start over in case of a robust
+			 * futex. The code above will take the futex
+			 * and return happy.
+			 */
+			if (curval & FUTEX_OWNER_DIED) {
+				ownerdied = 1;
+				goto retry;
+			}
+		default:
+			break;
+		}
+	}
+
+	return ret;
+}
+
 /*
  * The hash bucket lock must be held when this is called.
  * Afterwards, the futex_q must not be accessed.
@@ -1340,9 +1463,9 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct task_struct *curr = current;
 	struct futex_hash_bucket *hb;
-	u32 uval, newval, curval;
+	u32 uval;
 	struct futex_q q;
-	int ret, lock_taken, ownerdied = 0;
+	int ret;
 
 	if (refill_pi_state_cache())
 		return -ENOMEM;
@@ -1365,81 +1488,15 @@ retry:
 retry_private:
 	hb = queue_lock(&q);
 
-retry_locked:
-	ret = lock_taken = 0;
-
-	/*
-	 * To avoid races, we attempt to take the lock here again
-	 * (by doing a 0 -> TID atomic cmpxchg), while holding all
-	 * the locks. It will most likely not succeed.
-	 */
-	newval = task_pid_vnr(current);
-
-	curval = cmpxchg_futex_value_locked(uaddr, 0, newval);
-
-	if (unlikely(curval == -EFAULT))
-		goto uaddr_faulted;
-
-	/*
-	 * Detect deadlocks. In case of REQUEUE_PI this is a valid
-	 * situation and we return success to user space.
-	 */
-	if (unlikely((curval & FUTEX_TID_MASK) == task_pid_vnr(current))) {
-		ret = -EDEADLK;
-		goto out_unlock_put_key;
-	}
-
-	/*
-	 * Surprise - we got the lock. Just return to userspace:
-	 */
-	if (unlikely(!curval))
-		goto out_unlock_put_key;
-
-	uval = curval;
-
-	/*
-	 * Set the WAITERS flag, so the owner will know it has someone
-	 * to wake at next unlock
-	 */
-	newval = curval | FUTEX_WAITERS;
-
-	/*
-	 * There are two cases, where a futex might have no owner (the
-	 * owner TID is 0): OWNER_DIED. We take over the futex in this
-	 * case. We also do an unconditional take over, when the owner
-	 * of the futex died.
-	 *
-	 * This is safe as we are protected by the hash bucket lock !
-	 */
-	if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
-		/* Keep the OWNER_DIED bit */
-		newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(current);
-		ownerdied = 0;
-		lock_taken = 1;
-	}
-
-	curval = cmpxchg_futex_value_locked(uaddr, uval, newval);
-
-	if (unlikely(curval == -EFAULT))
-		goto uaddr_faulted;
-	if (unlikely(curval != uval))
-		goto retry_locked;
-
-	/*
-	 * We took the lock due to owner died take over.
-	 */
-	if (unlikely(lock_taken))
-		goto out_unlock_put_key;
-
-	/*
-	 * We dont have the lock. Look up the PI state (or create it if
-	 * we are the first waiter):
-	 */
-	ret = lookup_pi_state(uval, hb, &q.key, &q.pi_state);
-
+	ret = futex_lock_pi_atomic(uaddr, hb, &q.key, &q.pi_state, current);
 	if (unlikely(ret)) {
 		switch (ret) {
-
+		case 1:
+			/* We got the lock. */
+			ret = 0;
+			goto out_unlock_put_key;
+		case -EFAULT:
+			goto uaddr_faulted;
 		case -EAGAIN:
 			/*
 			 * Task is exiting and we just wait for the
@@ -1449,25 +1506,6 @@ retry_locked:
 			put_futex_key(fshared, &q.key);
 			cond_resched();
 			goto retry;
-
-		case -ESRCH:
-			/*
-			 * No owner found for this futex. Check if the
-			 * OWNER_DIED bit is set to figure out whether
-			 * this is a robust futex or not.
-			 */
-			if (get_futex_value_locked(&curval, uaddr))
-				goto uaddr_faulted;
-
-			/*
-			 * We simply start over in case of a robust
-			 * futex. The code above will take the futex
-			 * and return happy.
-			 */
-			if (curval & FUTEX_OWNER_DIED) {
-				ownerdied = 1;
-				goto retry_locked;
-			}
 		default:
 			goto out_unlock_put_key;
 		}


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip PATCH v6 4/8] RFC: futex: finish_futex_lock_pi()
  2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
                   ` (2 preceding siblings ...)
  2009-03-30 21:37 ` [tip PATCH v6 3/8] RFC: futex: futex_lock_pi_atomic() Darren Hart
@ 2009-03-30 21:38 ` Darren Hart
  2009-03-30 21:38 ` [tip PATCH v6 5/8] RFC: rt_mutex: add proxy lock routines Darren Hart
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-30 21:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Darren Hart,
	Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

Refactor the post lock acquisition logic from futex_lock_pi().  This code will
be reused in futex_wait_requeue_pi().

V6: -Minor comment updates
V4: -Corrected string paranoia check
    -Move the spinlock(q->lock_ptr) out of finish_futex_lock_pi to retain
     some semblance of lock/unlock in the same function.
V3: -Initial version

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jakub Jelinek <jakub@redhat.com>
---

 kernel/futex.c |  147 +++++++++++++++++++++++++++++++-------------------------
 1 files changed, 81 insertions(+), 66 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 905b52a..dd05803 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1258,6 +1258,84 @@ handle_fault:
 static long futex_wait_restart(struct restart_block *restart);
 
 /**
+ * finish_futex_lock_pi() - Post lock pi_state and corner case management
+ * @uaddr:	user address of the futex
+ * @fshared:	whether the futex is shared (1) or not (0)
+ * @q:		futex_q (contains pi_state and access to the rt_mutex)
+ * @ret:	the return value of the preceeding attempt to lock the rt_mutex
+ *
+ * After attempting to lock an rt_mutex, process the return code and cleanup
+ * the pi_state as well as handle race conditions that may have caused us to
+ * lose the lock. Must be called with the hb lock held.
+ *
+ * Returns:
+ *  0 - on success
+ * <0 - on error
+ */
+static int finish_futex_lock_pi(u32 __user *uaddr, int fshared,
+				struct futex_q *q, int ret)
+{
+	if (!ret) {
+		/*
+		 * Got the lock. We might not be the anticipated owner
+		 * if we did a lock-steal - fix up the PI-state in
+		 * that case:
+		 */
+		if (q->pi_state->owner != current)
+			ret = fixup_pi_state_owner(uaddr, q, current, fshared);
+		return ret;
+	}
+
+	/*
+	 * Catch the rare case, where the lock was released when we were on the
+	 * way back before we locked the hash bucket.
+	 */
+	if (q->pi_state->owner == current) {
+		/*
+		 * Try to get the rt_mutex now. This might fail as some other
+		 * task acquired the rt_mutex after we removed ourself from the
+		 * rt_mutex waiters list.
+		 */
+		if (rt_mutex_trylock(&q->pi_state->pi_mutex))
+			ret = 0;
+		else {
+			/*
+			 * pi_state is incorrect, some other task did a lock
+			 * steal and we returned due to timeout or signal
+			 * without taking the rt_mutex. Too late. We can access
+			 * the rt_mutex_owner without locking, as the other
+			 * task is now blocked on the hash bucket lock. Fix the
+			 * state up.
+			 */
+			struct task_struct *owner;
+			int res;
+
+			owner = rt_mutex_owner(&q->pi_state->pi_mutex);
+			res = fixup_pi_state_owner(uaddr, q, owner,
+					fshared);
+
+			/* propagate -EFAULT, if the fixup failed */
+			if (res)
+				ret = res;
+		}
+	} else {
+		/*
+		 * Paranoia check. If we did not take the lock in the trylock
+		 * above, then we should not be the owner of the rtmutex,
+		 * neither the real nor the pending one:
+		 */
+		if (rt_mutex_owner(&q->pi_state->pi_mutex) == current)
+			printk(KERN_ERR "finish_futex_lock_pi: "
+					"ret = %d pi-mutex: %p "
+					"pi-state %p\n", ret,
+					q->pi_state->pi_mutex.owner,
+					q->pi_state->owner);
+	}
+
+	return ret;
+}
+
+/**
  * futex_wait_queue_me() - queue_me and wait for wakeup, timeout, or signal.
  * @hb:		the futex hash bucket, must be locked by the caller
  * @q:		the futex_q to queue up on
@@ -1461,7 +1539,6 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
 			 int detect, ktime_t *time, int trylock)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
-	struct task_struct *curr = current;
 	struct futex_hash_bucket *hb;
 	u32 uval;
 	struct futex_q q;
@@ -1529,67 +1606,7 @@ retry_private:
 	}
 
 	spin_lock(q.lock_ptr);
-
-	if (!ret) {
-		/*
-		 * Got the lock. We might not be the anticipated owner
-		 * if we did a lock-steal - fix up the PI-state in
-		 * that case:
-		 */
-		if (q.pi_state->owner != curr)
-			ret = fixup_pi_state_owner(uaddr, &q, curr, fshared);
-	} else {
-		/*
-		 * Catch the rare case, where the lock was released
-		 * when we were on the way back before we locked the
-		 * hash bucket.
-		 */
-		if (q.pi_state->owner == curr) {
-			/*
-			 * Try to get the rt_mutex now. This might
-			 * fail as some other task acquired the
-			 * rt_mutex after we removed ourself from the
-			 * rt_mutex waiters list.
-			 */
-			if (rt_mutex_trylock(&q.pi_state->pi_mutex))
-				ret = 0;
-			else {
-				/*
-				 * pi_state is incorrect, some other
-				 * task did a lock steal and we
-				 * returned due to timeout or signal
-				 * without taking the rt_mutex. Too
-				 * late. We can access the
-				 * rt_mutex_owner without locking, as
-				 * the other task is now blocked on
-				 * the hash bucket lock. Fix the state
-				 * up.
-				 */
-				struct task_struct *owner;
-				int res;
-
-				owner = rt_mutex_owner(&q.pi_state->pi_mutex);
-				res = fixup_pi_state_owner(uaddr, &q, owner,
-							   fshared);
-
-				/* propagate -EFAULT, if the fixup failed */
-				if (res)
-					ret = res;
-			}
-		} else {
-			/*
-			 * Paranoia check. If we did not take the lock
-			 * in the trylock above, then we should not be
-			 * the owner of the rtmutex, neither the real
-			 * nor the pending one:
-			 */
-			if (rt_mutex_owner(&q.pi_state->pi_mutex) == curr)
-				printk(KERN_ERR "futex_lock_pi: ret = %d "
-				       "pi-mutex: %p pi-state %p\n", ret,
-				       q.pi_state->pi_mutex.owner,
-				       q.pi_state->owner);
-		}
-	}
+	ret = finish_futex_lock_pi(uaddr, fshared, &q, ret);
 
 	/*
 	 * If fixup_pi_state_owner() faulted and was unable to handle the
@@ -1601,9 +1618,7 @@ retry_private:
 	/* Unqueue and drop the lock */
 	unqueue_me_pi(&q);
 
-	if (to)
-		destroy_hrtimer_on_stack(&to->timer);
-	return ret != -EINTR ? ret : -ERESTARTNOINTR;
+	goto out;
 
 out_unlock_put_key:
 	queue_unlock(&q, hb);
@@ -1613,7 +1628,7 @@ out_put_key:
 out:
 	if (to)
 		destroy_hrtimer_on_stack(&to->timer);
-	return ret;
+	return ret != -EINTR ? ret : -ERESTARTNOINTR;
 
 uaddr_faulted:
 	/*


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip PATCH v6 5/8] RFC: rt_mutex: add proxy lock routines
  2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
                   ` (3 preceding siblings ...)
  2009-03-30 21:38 ` [tip PATCH v6 4/8] RFC: futex: finish_futex_lock_pi() Darren Hart
@ 2009-03-30 21:38 ` Darren Hart
  2009-03-30 21:38 ` [tip PATCH v6 6/8] RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags Darren Hart
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-30 21:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Darren Hart,
	Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

This patch is required for the first half of requeue_pi to function.  It
basically splits rt_mutex_slowlock() right down the middle, just before the
first call to schedule().

This patch uses a new futex_q field, rt_waiter, for now.  I think
I should be able to use task->pi_blocked_on in a future version of this patch.

V6: -add mark_rt_mutex_waiters() to rt_mutex_start_procy_lock() to avoid
     the race condition evident in previous versions
    -cleanup kernel-docs formatting and comments
    -try to take the lock in rt_mutex_start_proxy_lock() rather than assume
     the lock is held
    -remove initial schedule in finish_proxy_lock to allow for signal and
     timeout detection.
V5: -remove EXPORT_SYMBOL_GPL from the new routines
    -minor cleanups
V4: -made detect_deadlock a parameter to rt_mutex_enqueue_task
    -refactored rt_mutex_slowlock to share code with new functions
    -renamed rt_mutex_enqueue_task and rt_mutex_handle_wakeup to
     rt_mutex_start_proxy_lock and rt_mutex_finish_proxy_lock, respectively

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jakub Jelinek <jakub@redhat.com>
---

 kernel/rtmutex.c        |  240 +++++++++++++++++++++++++++++++++++++----------
 kernel/rtmutex_common.h |    8 ++
 2 files changed, 195 insertions(+), 53 deletions(-)

diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c
index 69d9cb9..fec77e7 100644
--- a/kernel/rtmutex.c
+++ b/kernel/rtmutex.c
@@ -300,7 +300,8 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
  * assigned pending owner [which might not have taken the
  * lock yet]:
  */
-static inline int try_to_steal_lock(struct rt_mutex *lock)
+static inline int try_to_steal_lock(struct rt_mutex *lock,
+				    struct task_struct *task)
 {
 	struct task_struct *pendowner = rt_mutex_owner(lock);
 	struct rt_mutex_waiter *next;
@@ -309,11 +310,11 @@ static inline int try_to_steal_lock(struct rt_mutex *lock)
 	if (!rt_mutex_owner_pending(lock))
 		return 0;
 
-	if (pendowner == current)
+	if (pendowner == task)
 		return 1;
 
 	spin_lock_irqsave(&pendowner->pi_lock, flags);
-	if (current->prio >= pendowner->prio) {
+	if (task->prio >= pendowner->prio) {
 		spin_unlock_irqrestore(&pendowner->pi_lock, flags);
 		return 0;
 	}
@@ -338,21 +339,21 @@ static inline int try_to_steal_lock(struct rt_mutex *lock)
 	 * We are going to steal the lock and a waiter was
 	 * enqueued on the pending owners pi_waiters queue. So
 	 * we have to enqueue this waiter into
-	 * current->pi_waiters list. This covers the case,
-	 * where current is boosted because it holds another
+	 * task->pi_waiters list. This covers the case,
+	 * where task is boosted because it holds another
 	 * lock and gets unboosted because the booster is
 	 * interrupted, so we would delay a waiter with higher
-	 * priority as current->normal_prio.
+	 * priority as task->normal_prio.
 	 *
 	 * Note: in the rare case of a SCHED_OTHER task changing
 	 * its priority and thus stealing the lock, next->task
-	 * might be current:
+	 * might be task:
 	 */
-	if (likely(next->task != current)) {
-		spin_lock_irqsave(&current->pi_lock, flags);
-		plist_add(&next->pi_list_entry, &current->pi_waiters);
-		__rt_mutex_adjust_prio(current);
-		spin_unlock_irqrestore(&current->pi_lock, flags);
+	if (likely(next->task != task)) {
+		spin_lock_irqsave(&task->pi_lock, flags);
+		plist_add(&next->pi_list_entry, &task->pi_waiters);
+		__rt_mutex_adjust_prio(task);
+		spin_unlock_irqrestore(&task->pi_lock, flags);
 	}
 	return 1;
 }
@@ -389,7 +390,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock)
 	 */
 	mark_rt_mutex_waiters(lock);
 
-	if (rt_mutex_owner(lock) && !try_to_steal_lock(lock))
+	if (rt_mutex_owner(lock) && !try_to_steal_lock(lock, current))
 		return 0;
 
 	/* We got the lock. */
@@ -411,6 +412,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock)
  */
 static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
 				   struct rt_mutex_waiter *waiter,
+				   struct task_struct *task,
 				   int detect_deadlock)
 {
 	struct task_struct *owner = rt_mutex_owner(lock);
@@ -418,21 +420,21 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
 	unsigned long flags;
 	int chain_walk = 0, res;
 
-	spin_lock_irqsave(&current->pi_lock, flags);
-	__rt_mutex_adjust_prio(current);
-	waiter->task = current;
+	spin_lock_irqsave(&task->pi_lock, flags);
+	__rt_mutex_adjust_prio(task);
+	waiter->task = task;
 	waiter->lock = lock;
-	plist_node_init(&waiter->list_entry, current->prio);
-	plist_node_init(&waiter->pi_list_entry, current->prio);
+	plist_node_init(&waiter->list_entry, task->prio);
+	plist_node_init(&waiter->pi_list_entry, task->prio);
 
 	/* Get the top priority waiter on the lock */
 	if (rt_mutex_has_waiters(lock))
 		top_waiter = rt_mutex_top_waiter(lock);
 	plist_add(&waiter->list_entry, &lock->wait_list);
 
-	current->pi_blocked_on = waiter;
+	task->pi_blocked_on = waiter;
 
-	spin_unlock_irqrestore(&current->pi_lock, flags);
+	spin_unlock_irqrestore(&task->pi_lock, flags);
 
 	if (waiter == rt_mutex_top_waiter(lock)) {
 		spin_lock_irqsave(&owner->pi_lock, flags);
@@ -460,7 +462,7 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
 	spin_unlock(&lock->wait_lock);
 
 	res = rt_mutex_adjust_prio_chain(owner, detect_deadlock, lock, waiter,
-					 current);
+					 task);
 
 	spin_lock(&lock->wait_lock);
 
@@ -605,37 +607,25 @@ void rt_mutex_adjust_pi(struct task_struct *task)
 	rt_mutex_adjust_prio_chain(task, 0, NULL, NULL, task);
 }
 
-/*
- * Slow path lock function:
+/**
+ * __rt_mutex_slowlock() - Perform the wait-wake-try-to-take loop
+ * @lock:		 the rt_mutex to take
+ * @state:		 the state the task should block in (TASK_INTERRUPTIBLE
+ * 			 or TASK_UNINTERRUPTIBLE)
+ * @timeout:		 the pre-initialized and started timer, or NULL for none
+ * @waiter:		 the pre-initialized rt_mutex_waiter
+ * @detect_deadlock:	 passed to task_blocks_on_rt_mutex
+ *
+ * lock->wait_lock must be held by the caller.
  */
 static int __sched
-rt_mutex_slowlock(struct rt_mutex *lock, int state,
-		  struct hrtimer_sleeper *timeout,
-		  int detect_deadlock)
+__rt_mutex_slowlock(struct rt_mutex *lock, int state,
+		    struct hrtimer_sleeper *timeout,
+		    struct rt_mutex_waiter *waiter,
+		    int detect_deadlock)
 {
-	struct rt_mutex_waiter waiter;
 	int ret = 0;
 
-	debug_rt_mutex_init_waiter(&waiter);
-	waiter.task = NULL;
-
-	spin_lock(&lock->wait_lock);
-
-	/* Try to acquire the lock again: */
-	if (try_to_take_rt_mutex(lock)) {
-		spin_unlock(&lock->wait_lock);
-		return 0;
-	}
-
-	set_current_state(state);
-
-	/* Setup the timer, when timeout != NULL */
-	if (unlikely(timeout)) {
-		hrtimer_start_expires(&timeout->timer, HRTIMER_MODE_ABS);
-		if (!hrtimer_active(&timeout->timer))
-			timeout->task = NULL;
-	}
-
 	for (;;) {
 		/* Try to acquire the lock: */
 		if (try_to_take_rt_mutex(lock))
@@ -656,19 +646,19 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
 		}
 
 		/*
-		 * waiter.task is NULL the first time we come here and
+		 * waiter->task is NULL the first time we come here and
 		 * when we have been woken up by the previous owner
 		 * but the lock got stolen by a higher prio task.
 		 */
-		if (!waiter.task) {
-			ret = task_blocks_on_rt_mutex(lock, &waiter,
+		if (!waiter->task) {
+			ret = task_blocks_on_rt_mutex(lock, waiter, current,
 						      detect_deadlock);
 			/*
 			 * If we got woken up by the owner then start loop
 			 * all over without going into schedule to try
 			 * to get the lock now:
 			 */
-			if (unlikely(!waiter.task)) {
+			if (unlikely(!waiter->task)) {
 				/*
 				 * Reset the return value. We might
 				 * have returned with -EDEADLK and the
@@ -684,15 +674,52 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
 
 		spin_unlock(&lock->wait_lock);
 
-		debug_rt_mutex_print_deadlock(&waiter);
+		debug_rt_mutex_print_deadlock(waiter);
 
-		if (waiter.task)
+		if (waiter->task)
 			schedule_rt_mutex(lock);
 
 		spin_lock(&lock->wait_lock);
 		set_current_state(state);
 	}
 
+	return ret;
+}
+
+/*
+ * Slow path lock function:
+ */
+static int __sched
+rt_mutex_slowlock(struct rt_mutex *lock, int state,
+		  struct hrtimer_sleeper *timeout,
+		  int detect_deadlock)
+{
+	struct rt_mutex_waiter waiter;
+	int ret = 0;
+
+	debug_rt_mutex_init_waiter(&waiter);
+	waiter.task = NULL;
+
+	spin_lock(&lock->wait_lock);
+
+	/* Try to acquire the lock again: */
+	if (try_to_take_rt_mutex(lock)) {
+		spin_unlock(&lock->wait_lock);
+		return 0;
+	}
+
+	set_current_state(state);
+
+	/* Setup the timer, when timeout != NULL */
+	if (unlikely(timeout)) {
+		hrtimer_start_expires(&timeout->timer, HRTIMER_MODE_ABS);
+		if (!hrtimer_active(&timeout->timer))
+			timeout->task = NULL;
+	}
+
+	ret = __rt_mutex_slowlock(lock, state, timeout, &waiter,
+				  detect_deadlock);
+
 	set_current_state(TASK_RUNNING);
 
 	if (unlikely(waiter.task))
@@ -986,6 +1013,59 @@ void rt_mutex_proxy_unlock(struct rt_mutex *lock,
 }
 
 /**
+ * rt_mutex_start_proxy_lock() - Start lock acquisition for another task
+ * @lock:		the rt_mutex to take
+ * @waiter:		the pre-initialized rt_mutex_waiter
+ * @task:		the task to prepare
+ * @detect_deadlock:	perform deadlock detection (1) or not (0)
+ *
+ * Returns:
+ *  0 - task blocked on lock
+ *  1 - acquired the lock for task, caller should wake it up
+ * <0 - error
+ *
+ * Special API call for FUTEX_REQUEUE_PI support.
+ */
+int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+			      struct rt_mutex_waiter *waiter,
+			      struct task_struct *task, int detect_deadlock)
+{
+	int ret;
+
+	spin_lock(&lock->wait_lock);
+
+	mark_rt_mutex_waiters(lock);
+
+	if (!rt_mutex_owner(lock) || try_to_steal_lock(lock, task)) {
+		/* We got the lock for task. */
+		debug_rt_mutex_lock(lock);
+
+		rt_mutex_set_owner(lock, task, 0);
+
+		rt_mutex_deadlock_account_lock(lock, task);
+		return 1;
+	}
+
+	ret = task_blocks_on_rt_mutex(lock, waiter, task, detect_deadlock);
+
+
+	if (ret && !waiter->task) {
+		/*
+		 * Reset the return value. We might have
+		 * returned with -EDEADLK and the owner
+		 * released the lock while we were walking the
+		 * pi chain.  Let the waiter sort it out.
+		 */
+		ret = 0;
+	}
+	spin_unlock(&lock->wait_lock);
+
+	debug_rt_mutex_print_deadlock(waiter);
+
+	return ret;
+}
+
+/**
  * rt_mutex_next_owner - return the next owner of the lock
  *
  * @lock: the rt lock query
@@ -1004,3 +1084,57 @@ struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock)
 
 	return rt_mutex_top_waiter(lock)->task;
 }
+
+/**
+ * rt_mutex_finish_proxy_lock() - Complete lock acquisition
+ * @lock:		the rt_mutex we were woken on
+ * @to:			the timeout, null if none. hrtimer should already have
+ * 			been started.
+ * @waiter:		the pre-initialized rt_mutex_waiter
+ * @detect_deadlock:	perform deadlock detection (1) or not (0)
+ *
+ * Complete the lock acquisition started our behalf by another thread.
+ *
+ * Returns:
+ *  0 - success
+ * <0 - error, one of -EINTR, -ETIMEDOUT, or -EDEADLK
+ *
+ * Special API call for PI-futex requeue support
+ */
+int rt_mutex_finish_proxy_lock(struct rt_mutex *lock,
+			       struct hrtimer_sleeper *to,
+			       struct rt_mutex_waiter *waiter,
+			       int detect_deadlock)
+{
+	int ret;
+
+	spin_lock(&lock->wait_lock);
+
+	set_current_state(TASK_INTERRUPTIBLE);
+
+	ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter,
+				  detect_deadlock);
+
+	set_current_state(TASK_RUNNING);
+
+	if (unlikely(waiter->task))
+		remove_waiter(lock, waiter);
+
+	/*
+	 * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
+	 * have to fix that up.
+	 */
+	fixup_rt_mutex_waiters(lock);
+
+	spin_unlock(&lock->wait_lock);
+
+	/*
+	 * Readjust priority, when we did not get the lock. We might have been
+	 * the pending owner and boosted. Since we did not take the lock, the
+	 * PI boost has to go.
+	 */
+	if (unlikely(ret))
+		rt_mutex_adjust_prio(current);
+
+	return ret;
+}
diff --git a/kernel/rtmutex_common.h b/kernel/rtmutex_common.h
index e124bf5..97a2f81 100644
--- a/kernel/rtmutex_common.h
+++ b/kernel/rtmutex_common.h
@@ -120,6 +120,14 @@ extern void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
 				       struct task_struct *proxy_owner);
 extern void rt_mutex_proxy_unlock(struct rt_mutex *lock,
 				  struct task_struct *proxy_owner);
+extern int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
+				     struct rt_mutex_waiter *waiter,
+				     struct task_struct *task,
+				     int detect_deadlock);
+extern int rt_mutex_finish_proxy_lock(struct rt_mutex *lock,
+				      struct hrtimer_sleeper *to,
+				      struct rt_mutex_waiter *waiter,
+				      int detect_deadlock);
 
 #ifdef CONFIG_DEBUG_RT_MUTEXES
 # include "rtmutex-debug.h"


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip PATCH v6 6/8] RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags
  2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
                   ` (4 preceding siblings ...)
  2009-03-30 21:38 ` [tip PATCH v6 5/8] RFC: rt_mutex: add proxy lock routines Darren Hart
@ 2009-03-30 21:38 ` Darren Hart
  2009-03-30 21:40   ` Eric Dumazet
  2009-03-30 21:38 ` [tip PATCH v6 7/8] RFC: futex: Add requeue_futex() call Darren Hart
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Darren Hart @ 2009-03-30 21:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Darren Hart,
	Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

Currently restart is only used if there is a timeout.  The requeue_pi
functionality requires restarting to futex_lock_pi() on signal after
wakeup in futex_wait_requeue_pi() regardless of if there was a timeout
or not.  Using 0 for the timeout value is confusing as that could
indicate an expired timer.  The flag makes this explicit.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jakub Jelinek <jakub@redhat.com>
---

 kernel/futex.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index dd05803..f54faf5 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1254,6 +1254,7 @@ handle_fault:
  */
 #define FLAGS_SHARED		0x01
 #define FLAGS_CLOCKRT		0x02
+#define FLAGS_HAS_TIMEOUT	0x04
 
 static long futex_wait_restart(struct restart_block *restart);
 
@@ -1493,7 +1494,7 @@ retry_private:
 	restart->futex.val = val;
 	restart->futex.time = abs_time->tv64;
 	restart->futex.bitset = bitset;
-	restart->futex.flags = 0;
+	restart->futex.flags = FLAGS_HAS_TIMEOUT;
 
 	if (fshared)
 		restart->futex.flags |= FLAGS_SHARED;
@@ -1517,13 +1518,16 @@ static long futex_wait_restart(struct restart_block *restart)
 {
 	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
 	int fshared = 0;
-	ktime_t t;
+	ktime_t t, *tp = NULL;
 
-	t.tv64 = restart->futex.time;
+	if (restart->futex.flags | FLAGS_HAS_TIMEOUT) {
+		t.tv64 = restart->futex.time;
+		tp = &t;
+	}
 	restart->fn = do_no_restart_syscall;
 	if (restart->futex.flags & FLAGS_SHARED)
 		fshared = 1;
-	return (long)futex_wait(uaddr, fshared, restart->futex.val, &t,
+	return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
 				restart->futex.bitset,
 				restart->futex.flags & FLAGS_CLOCKRT);
 }


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip PATCH v6 7/8] RFC: futex: Add requeue_futex() call
  2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
                   ` (5 preceding siblings ...)
  2009-03-30 21:38 ` [tip PATCH v6 6/8] RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags Darren Hart
@ 2009-03-30 21:38 ` Darren Hart
  2009-03-30 21:38 ` [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls Darren Hart
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-30 21:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Darren Hart,
	Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

futex_requeue() is getting a bit long-winded, and will be getting more so after
the requeue_pi patch.  Factor out the actual requeueing into a nicely contained
inline function to reduce function length and improve legibility.

Changelog:
V6: -Initial version

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jakub Jelinek <jakub@redhat.com>
---

 kernel/futex.c |   43 +++++++++++++++++++++++++++++--------------
 1 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f54faf5..61a8a8b 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -942,6 +942,34 @@ out:
 	return ret;
 }
 
+/**
+ * requeue_futex() - Requeue a futex_q from one hb to another
+ * @q:		the futex_q to requeue
+ * @hb1:	the source hash_bucket
+ * @hb2:	the target hash_bucket
+ * @key2:	the new key for the requeued futex_q
+ */
+static inline
+void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
+		   struct futex_hash_bucket *hb2, union futex_key *key2)
+{
+
+	/*
+	 * If key1 and key2 hash to the same bucket, no need to
+	 * requeue.
+	 */
+	if (likely(&hb1->chain != &hb2->chain)) {
+		plist_del(&q->list, &hb1->chain);
+		plist_add(&q->list, &hb2->chain);
+		q->lock_ptr = &hb2->lock;
+#ifdef CONFIG_DEBUG_PI_LIST
+		q->list.plist.lock = &hb2->lock;
+#endif
+	}
+	get_futex_key_refs(key2);
+	q->key = *key2;
+}
+
 /*
  * Requeue all waiters hashed on one physical page to another
  * physical page.
@@ -1001,20 +1029,7 @@ retry_private:
 		if (++ret <= nr_wake) {
 			wake_futex(this);
 		} else {
-			/*
-			 * If key1 and key2 hash to the same bucket, no need to
-			 * requeue.
-			 */
-			if (likely(head1 != &hb2->chain)) {
-				plist_del(&this->list, &hb1->chain);
-				plist_add(&this->list, &hb2->chain);
-				this->lock_ptr = &hb2->lock;
-#ifdef CONFIG_DEBUG_PI_LIST
-				this->list.plist.lock = &hb2->lock;
-#endif
-			}
-			this->key = key2;
-			get_futex_key_refs(&key2);
+			requeue_futex(this, hb1, hb2, &key2);
 			drop_count++;
 
 			if (ret - nr_wake >= nr_requeue)


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls
  2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
                   ` (6 preceding siblings ...)
  2009-03-30 21:38 ` [tip PATCH v6 7/8] RFC: futex: Add requeue_futex() call Darren Hart
@ 2009-03-30 21:38 ` Darren Hart
  2009-03-30 21:47   ` Eric Dumazet
  2009-03-31  7:30   ` Thomas Gleixner
  2009-03-30 21:55 ` glibc hacks for requeue_pi Darren Hart
  2009-03-31  2:09 ` [tip PATCH v6 0/8] requeue pi implementation Steven Rostedt
  9 siblings, 2 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-30 21:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Darren Hart,
	Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

PI Futexes and their underlying rt_mutex cannot be left ownerless if there are
pending waiters as this will break the PI boosting logic, so the standard
requeue commands aren't sufficient.  The new commands properly manage pi futex
ownership by ensuring a futex with waiters has an owner at all times.  This
will allow glibc to properly handle pi mutexes with pthread_condvars.

The approach taken here is to create two new futex op codes:

FUTEX_WAIT_REQUEUE_PI:
Tasks will use this op code to wait on a futex (such as a non-pi waitqueue)
and wake after they have been requeued to a pi futex.  Prior to returning to
userspace, they will acquire this pi futex (and the underlying rt_mutex).

futex_wait_requeue_pi() is the result of a high speed collision between
futex_wait() and futex_lock_pi() (with the first part of futex_lock_pi() being
done by futex_proxy_trylock_atomic() on behalf of the top_waiter).

FUTEX_REQUEUE_PI (and FUTEX_CMP_REQUEUE_PI):
This call must be used to wake tasks waiting with FUTEX_WAIT_REQUEUE_PI,
regardless of how many tasks the caller intends to wake or requeue.
pthread_cond_broadcast() should call this with nr_wake=1 and
nr_requeue=INT_MAX.  pthread_cond_signal() should call this with nr_wake=1 and
nr_requeue=0.  The reason being we need both callers to get the benefit of the
futex_proxy_trylock_atomic() routine.  futex_requeue() also enqueues the
top_waiter on the rt_mutex via rt_mutex_start_proxy_lock().

Changelog:
V6: -Moved non requeue_pi related fixes/changes into separate patches
    -Make use of new double_unlock_hb()
    -Futex key management updates
    -Removed unnecessary futex_requeue_pi_cleanup() routine
    -Return -EINVAL if futex_wake is called with q.rt_waiter != NULL
    -Rewrote futex_wait_requeue_pi() wakeup logic
    -Rewrote requeue/wakeup loop
    -Renamed futex_requeue_pi_init() to futex_proxy_trylock_atomic()
    -Handle third party owner, removed -EMORON :-(
    -Comment updates
V5: -Update futex_requeue to allow for nr_requeue == 0
    -Whitespace cleanup
    -Added task_count var to futex_requeue to avoid confusion between
     ret, res, and ret used to count wakes and requeues
V4: -Cleanups to pass checkpatch.pl
    -Added missing goto out; in futex_wait_requeue_pi()
    -Moved rt_mutex_handle_wakeup to the rt_mutex_enqueue_task patch as they
     are a functional pair.
    -Fixed several error exit paths that failed to unqueue the futex_q, which
     not only would leave the futex_q on the hb, but would have caused an exit
     race with the waiter since they weren't synchonized on the hb lock.  Thanks
     Sripathi for catching this.
    -Fix pi_state handling in futex_requeue
    -Several other minor fixes to futex_requeue_pi
    -add requeue_futex function and force the requeue in requeue_pi even for the
     task we wake in the requeue loop
    -refill the pi state cache at the beginning of futex_requeue for requeue_pi
    -have futex_requeue_pi_init ensure it stores off the pi_state for use in
     futex_requeue
    - Delayed starting the hrtimer until after TASK_INTERRUPTIBLE is set
    - Fixed NULL pointer bug when futex_wait_requeue_pi() has no timer and
      receives a signal after waking on uaddr2.  Added has_timeout to the
      restart->futex structure.
V3: -Added FUTEX_CMP_REQUEUE_PI op
    -Put fshared support back.  So long as it is encoded in the op code, we
     assume both the uaddr's are either private or share, but not mixed.
    -Fixed access to expected value of uaddr2 in futex_wait_requeue_pi()
V2: -Added rt_mutex enqueueing to futex_requeue_pi_init
    -Updated fault handling and exit logic
V1: -Initial verion

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jakub Jelinek <jakub@redhat.com>
---

 include/linux/futex.h       |    8 +
 include/linux/thread_info.h |    3 
 kernel/futex.c              |  533 +++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 524 insertions(+), 20 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 3bf5bb5..b05519c 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -23,6 +23,9 @@ union ktime;
 #define FUTEX_TRYLOCK_PI	8
 #define FUTEX_WAIT_BITSET	9
 #define FUTEX_WAKE_BITSET	10
+#define FUTEX_WAIT_REQUEUE_PI	11
+#define FUTEX_REQUEUE_PI	12
+#define FUTEX_CMP_REQUEUE_PI	13
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
@@ -38,6 +41,11 @@ union ktime;
 #define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAIT_BITSET_PRIVATE	(FUTEX_WAIT_BITS | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAKE_BITSET_PRIVATE	(FUTEX_WAKE_BITS | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | \
+					 FUTEX_PRIVATE_FLAG)
+#define FUTEX_REQUEUE_PI_PRIVATE	(FUTEX_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
+					 FUTEX_PRIVATE_FLAG)
 
 /*
  * Support for robust futexes: the kernel cleans up held futexes at
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index e6b820f..a8cc4e1 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -21,13 +21,14 @@ struct restart_block {
 		struct {
 			unsigned long arg0, arg1, arg2, arg3;
 		};
-		/* For futex_wait */
+		/* For futex_wait and futex_wait_requeue_pi */
 		struct {
 			u32 *uaddr;
 			u32 val;
 			u32 flags;
 			u32 bitset;
 			u64 time;
+			u32 *uaddr2;
 		} futex;
 		/* For nanosleep */
 		struct {
diff --git a/kernel/futex.c b/kernel/futex.c
index 61a8a8b..a673bb3 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -19,6 +19,10 @@
  *  PRIVATE futexes by Eric Dumazet
  *  Copyright (C) 2007 Eric Dumazet <dada1@cosmosbay.com>
  *
+ *  Requeue-PI support by Darren Hart <dvhltc@us.ibm.com>
+ *  Copyright (C) IBM Corporation, 2009
+ *  Thanks to Thomas Gleixner for conceptual design and careful reviews.
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -109,6 +113,9 @@ struct futex_q {
 	struct futex_pi_state *pi_state;
 	struct task_struct *task;
 
+	/* rt_waiter storage for requeue_pi: */
+	struct rt_mutex_waiter *rt_waiter;
+
 	/* Bitset for the optional bitmasked wakeup */
 	u32 bitset;
 };
@@ -829,7 +836,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
 
 	plist_for_each_entry_safe(this, next, head, list) {
 		if (match_futex (&this->key, &key)) {
-			if (this->pi_state) {
+			if (this->pi_state || this->rt_waiter) {
 				ret = -EINVAL;
 				break;
 			}
@@ -970,20 +977,116 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
 	q->key = *key2;
 }
 
-/*
- * Requeue all waiters hashed on one physical page to another
- * physical page.
+/**
+ * futex_proxy_trylock_atomic() - Attempt an atomic lock for the top waiter
+ * @pifutex:	the user address of the to futex
+ * @hb1:	the from futex hash bucket, must be locked by the caller
+ * @hb2:	the to futex hash bucket, must be locked by the caller
+ * @key1:	the from futex key
+ * @key2:	the to futex key
+ *
+ * Try and get the lock on behalf of the top waiter if we can do it atomically.
+ * Wake the top waiter if we succeed.  hb1 and hb2 must be held by the caller.
+ *
+ * Faults occur for two primary reasons at this point:
+ * 1) The address isn't mapped
+ * 2) The address isn't writeable
+ *
+ * We return EFAULT on either of these cases and rely on the caller to handle
+ * them.
+ *
+ * Returns:
+ *  0 - failed to acquire the lock atomicly
+ *  1 - acquired the lock
+ * <0 - error
+ */
+static int futex_proxy_trylock_atomic(u32 __user *pifutex,
+				 struct futex_hash_bucket *hb1,
+				 struct futex_hash_bucket *hb2,
+				 union futex_key *key1, union futex_key *key2,
+				 struct futex_pi_state **ps)
+{
+	struct futex_q *top_waiter;
+	u32 curval;
+	int ret;
+
+	if (get_futex_value_locked(&curval, pifutex))
+		return -EFAULT;
+
+	top_waiter = futex_top_waiter(hb1, key1);
+
+	/* There are no waiters, nothing for us to do. */
+	if (!top_waiter)
+		return 0;
+
+	/*
+	 * Either take the lock for top_waiter or set the FUTEX_WAITERS bit.
+	 * The pi_state is returned in ps in contended cases.
+	 */
+	ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task);
+	if (ret == 1) {
+		/*
+		 * Set the top_waiter key for the requeue target futex so the
+		 * waiter can detect the wakeup on the right futex, but remove
+		 * it from the hb so it can detect atomic lock acquisition.
+		 */
+		drop_futex_key_refs(&top_waiter->key);
+		get_futex_key_refs(key2);
+		top_waiter->key = *key2;
+		WARN_ON(plist_node_empty(&top_waiter->list));
+		plist_del(&top_waiter->list, &top_waiter->list.plist);
+		/*
+		 * FIXME: wake_futex() wakes first, then nulls the lock_ptr,
+		 * and uses a memory barrier.  Do we need to?
+		 */
+		top_waiter->lock_ptr = NULL;
+		wake_up(&top_waiter->waiter);
+	}
+
+	return ret;
+}
+
+/**
+ * futex_requeue() - Requeue waiters from uaddr1 to uaddr2
+ * uaddr1:	source futex user address
+ * uaddr2:	target futex user address
+ * nr_wake:	number of waiters to wake (must be 1 for requeue_pi)
+ * nr_requeue:	number of waiters to requeue (0-INT_MAX)
+ * requeue_pi:	if we are attempting to requeue from a non-pi futex to a
+ * 		pi futex (pi to pi requeue is not supported)
+ *
+ * Requeue waiters on uaddr1 to uaddr2. In the requeue_pi case, try to acquire
+ * uaddr2 atomically on behalf of the top waiter.
+ *
+ * Returns:
+ * >=0: on success, the number of tasks requeued or woken
+ *  <0: on error
  */
 static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
-			 int nr_wake, int nr_requeue, u32 *cmpval)
+			 int nr_wake, int nr_requeue, u32 *cmpval,
+			 int requeue_pi)
 {
 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
+	int drop_count = 0, task_count = 0, ret;
+	struct futex_pi_state *pi_state = NULL;
 	struct futex_hash_bucket *hb1, *hb2;
 	struct plist_head *head1;
 	struct futex_q *this, *next;
-	int ret, drop_count = 0;
+	u32 curval2;
+
+	if (requeue_pi) {
+		if (refill_pi_state_cache())
+			return -ENOMEM;
+		if (nr_wake != 1)
+			return -EINVAL;
+	}
 
 retry:
+	if (pi_state != NULL) {
+		free_pi_state(pi_state);
+		pi_state = NULL;
+	}
+
 	ret = get_futex_key(uaddr1, fshared, &key1);
 	if (unlikely(ret != 0))
 		goto out;
@@ -1022,19 +1125,92 @@ retry_private:
 		}
 	}
 
+	if (requeue_pi) {
+		/* Attempt to acquire uaddr2 and wake the top_waiter. */
+		ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
+						 &key2, &pi_state);
+
+		/*
+		 * At this point the top_waiter has either taken uaddr2 or is
+		 * waiting on it.  If the former, then the pi_state will not
+		 * exist yet, look it up one more time to ensure we have a
+		 * reference to it.
+		 */
+		if (ret == 1 && !pi_state) {
+			task_count++;
+			ret = get_futex_value_locked(&curval2, uaddr2);
+			if (!ret)
+				ret = lookup_pi_state(curval2, hb2, &key2,
+						      &pi_state);
+		}
+
+		switch (ret) {
+		case 0:
+			break;
+		case -EFAULT:
+			double_unlock_hb(hb1, hb2);
+			put_futex_key(fshared, &key2);
+			put_futex_key(fshared, &key1);
+			ret = get_user(curval2, uaddr2);
+			if (!ret)
+				goto retry;
+			goto out;
+		case -EAGAIN:
+			/* The owner was exiting, try again. */
+			double_unlock_hb(hb1, hb2);
+			put_futex_key(fshared, &key2);
+			put_futex_key(fshared, &key1);
+			cond_resched();
+			goto retry;
+		default:
+			goto out_unlock;
+		}
+	}
+
 	head1 = &hb1->chain;
 	plist_for_each_entry_safe(this, next, head1, list) {
-		if (!match_futex (&this->key, &key1))
+		if (task_count - nr_wake >= nr_requeue)
+			break;
+
+		if (!match_futex(&this->key, &key1))
 			continue;
-		if (++ret <= nr_wake) {
+
+		/* This can go after we're satisfied with testing. */
+		if (!requeue_pi)
+			WARN_ON(this->rt_waiter);
+
+		/*
+		 * Wake nr_wake waiters.  For requeue_pi, if we acquired the
+		 * lock, we already woke the top_waiter.  If not, it will be
+		 * woken by futex_unlock_pi().
+		 */
+		if (++task_count <= nr_wake && !requeue_pi) {
 			wake_futex(this);
-		} else {
-			requeue_futex(this, hb1, hb2, &key2);
-			drop_count++;
+			continue;
+		}
 
-			if (ret - nr_wake >= nr_requeue)
-				break;
+		/*
+		 * Requeue nr_requeue waiters and possibly one more in the case
+		 * of requeue_pi if we couldn't acquire the lock atomically.
+		 */
+		if (requeue_pi) {
+			/* This can go after we're satisfied with testing. */
+			WARN_ON(!this->rt_waiter);
+
+			/* Prepare the waiter to take the rt_mutex. */
+			atomic_inc(&pi_state->refcount);
+			this->pi_state = pi_state;
+			ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
+							this->rt_waiter,
+							this->task, 1);
+			if (ret) {
+				this->pi_state = NULL;
+				free_pi_state(pi_state);
+				goto out_unlock;
+			}
 		}
+		requeue_futex(this, hb1, hb2, &key2);
+		drop_count++;
 	}
 
 out_unlock:
@@ -1049,7 +1225,9 @@ out_put_keys:
 out_put_key1:
 	put_futex_key(fshared, &key1);
 out:
-	return ret;
+	if (pi_state != NULL)
+		free_pi_state(pi_state);
+	return ret ? ret : task_count;
 }
 
 /* The key must be already stored in q->key. */
@@ -1272,6 +1450,8 @@ handle_fault:
 #define FLAGS_HAS_TIMEOUT	0x04
 
 static long futex_wait_restart(struct restart_block *restart);
+static long futex_wait_requeue_pi_restart(struct restart_block *restart);
+static long futex_lock_pi_restart(struct restart_block *restart);
 
 /**
  * finish_futex_lock_pi() - Post lock pi_state and corner case management
@@ -1419,6 +1599,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
 
 	q.pi_state = NULL;
 	q.bitset = bitset;
+	q.rt_waiter = NULL;
 
 	if (abs_time) {
 		unsigned long slack;
@@ -1575,6 +1756,7 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
 	}
 
 	q.pi_state = NULL;
+	q.rt_waiter = NULL;
 retry:
 	q.key = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr, fshared, &q.key);
@@ -1670,6 +1852,20 @@ uaddr_faulted:
 	goto retry;
 }
 
+static long futex_lock_pi_restart(struct restart_block *restart)
+{
+	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
+	ktime_t t, *tp = NULL;
+	int fshared = restart->futex.flags & FLAGS_SHARED;
+
+	if (restart->futex.flags | FLAGS_HAS_TIMEOUT) {
+		t.tv64 = restart->futex.time;
+		tp = &t;
+	}
+	restart->fn = do_no_restart_syscall;
+
+	return (long)futex_lock_pi(uaddr, fshared, restart->futex.val, tp, 0);
+}
 
 /*
  * Userspace attempted a TID -> 0 atomic transition, and failed.
@@ -1772,6 +1968,290 @@ pi_faulted:
 	return ret;
 }
 
+/**
+ * futex_wait_requeue_pi() - Wait on uaddr and take uaddr2
+ * @uaddr:	the futex we initialyl wait on (non-pi)
+ * @fshared:	whether the futexes are shared (1) or not (0).  They must be
+ * 		the same type, no requeueing from private to shared, etc.
+ * @val:	the expected value of uaddr
+ * @abs_time:	absolute timeout
+ * @bitset:	32 bit wakeup bitset set by userspace, defaults to all.
+ * @clockrt:	whether to use CLOCK_REALTIME (1) or CLOCK_MONOTONIC (0)
+ * @uaddr2:	the pi futex we will take prior to returning to user-space
+ *
+ * The caller will wait on uaddr and will be requeued by futex_requeue() to
+ * uaddr2 which must be PI aware.  Normal wakeup will wake on uaddr2 and
+ * complete the acquisition of the rt_mutex prior to returning to userspace.
+ * This ensures the rt_mutex maintains an owner when it has waiters; without
+ * one, the pi logic wouldn't know which task to boost/deboost, if there was a
+ * need to.
+ *
+ * We call schedule in futex_wait_queue_me() when we enqueue and return there
+ * via the following:
+ * 1) wakeup on uaddr2 after an atomic lock acquisition by futex_requeue()
+ * 2) wakeup on uaddr2 after a requeue and subsequent unlock
+ * 3) signal (before or after requeue)
+ * 4) timeout (before or after requeue)
+ *
+ * If 3, we setup a restart_block with futex_wait_requeue_pi() as the function.
+ *
+ * If 2, we may then block on trying to take the rt_mutex and return via:
+ * 5) successful lock
+ * 6) signal
+ * 7) timeout
+ * 8) other lock acquisition failure
+ *
+ * If 6, we setup a restart_block with futex_lock_pi() as the function.
+ *
+ * If 4 or 7, we cleanup and return with -ETIMEDOUT.
+ *
+ * Returns:
+ *  0 - On success
+ * <0 - On error
+ */
+static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
+				 u32 val, ktime_t *abs_time, u32 bitset,
+				 int clockrt, u32 __user *uaddr2)
+{
+	struct hrtimer_sleeper timeout, *to = NULL;
+	struct rt_mutex_waiter rt_waiter;
+	struct restart_block *restart;
+	struct futex_hash_bucket *hb;
+	struct rt_mutex *pi_mutex;
+	union futex_key key2;
+	struct futex_q q;
+	u32 uval;
+	int ret;
+
+	if (!bitset)
+		return -EINVAL;
+
+	if (abs_time) {
+		to = &timeout;
+		hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME :
+				      CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+		hrtimer_init_sleeper(to, current);
+		hrtimer_set_expires_range_ns(&to->timer, *abs_time,
+					     current->timer_slack_ns);
+	}
+
+	/*
+	 * The waiter is allocated on our stack, manipulated by the requeue
+	 * code while we sleep on uaddr.
+	 */
+	debug_rt_mutex_init_waiter(&rt_waiter);
+	rt_waiter.task = NULL;
+
+	q.pi_state = NULL;
+	q.bitset = bitset;
+	q.rt_waiter = &rt_waiter;
+
+retry:
+	q.key = FUTEX_KEY_INIT;
+	ret = get_futex_key(uaddr, fshared, &q.key);
+	if (unlikely(ret != 0))
+		goto out;
+
+	key2 = FUTEX_KEY_INIT;
+	ret = get_futex_key(uaddr2, fshared, &key2);
+	if (unlikely(ret != 0)) {
+		put_futex_key(fshared, &q.key);
+		goto out;
+	}
+
+	hb = queue_lock(&q);
+
+	/*
+	 * Access the page AFTER the hash-bucket is locked.
+	 * Order is important:
+	 *
+	 *   Userspace waiter: val = var; if (cond(val)) futex_wait(&var, val);
+	 *   Userspace waker:  if (cond(var)) { var = new; futex_wake(&var); }
+	 *
+	 * The basic logical guarantee of a futex is that it blocks ONLY
+	 * if cond(var) is known to be true at the time of blocking, for
+	 * any cond.  If we queued after testing *uaddr, that would open
+	 * a race condition where we could block indefinitely with
+	 * cond(var) false, which would violate the guarantee.
+	 *
+	 * A consequence is that futex_wait() can return zero and absorb
+	 * a wakeup when *uaddr != val on entry to the syscall.  This is
+	 * rare, but normal.
+	 */
+	ret = get_futex_value_locked(&uval, uaddr);
+
+	if (unlikely(ret)) {
+		queue_unlock(&q, hb);
+		put_futex_key(fshared, &q.key);
+		put_futex_key(fshared, &key2);
+
+		ret = get_user(uval, uaddr);
+		if (!ret)
+			goto retry;
+		goto out;
+	}
+
+	/* Only actually queue if *uaddr contained val.  */
+	ret = -EWOULDBLOCK;
+	if (uval != val) {
+		queue_unlock(&q, hb);
+		put_futex_key(fshared, &q.key);
+		put_futex_key(fshared, &key2);
+		goto out;
+	}
+
+	/* Queue the futex_q, drop the hb lock, wait for wakeup. */
+	futex_wait_queue_me(hb, &q, to);
+
+	/*
+	 * Ensure the requeue is atomic to avoid races while we process the
+	 * wakeup.  We only need to hold hb->lock to ensure atomicity as the
+	 * wakeup code can't change q.key from uaddr to uaddr2 if we hold that
+	 * lock. It can't be requeued from uaddr2 to something else since we
+	 * don't support a PI aware source futex for requeue.
+	 */
+	spin_lock(&hb->lock);
+	if (!match_futex(&q.key, &key2)) {
+		WARN_ON(q.lock_ptr && (&hb->lock != q.lock_ptr));
+		/*
+		 * We were not requeued, handle wakeup from futex1 (uaddr).  We
+		 * cannot have been unqueued and already hold the lock, no need
+		 * to call unqueue_me, just do it directly.
+		 */
+		plist_del(&q.list, &q.list.plist);
+		drop_futex_key_refs(&q.key);
+
+		ret = -ETIMEDOUT;
+		if (to && !to->task) {
+			spin_unlock(&hb->lock);
+			goto out_put_keys;
+		}
+
+		/*
+		 * We expect signal_pending(current), but another thread may
+		 * have handled it for us already.
+		 */
+		ret = -ERESTARTSYS;
+		if (!abs_time) {
+			spin_unlock(&hb->lock);
+			goto out_put_keys;
+		}
+
+		restart = &current_thread_info()->restart_block;
+		restart->fn = futex_wait_requeue_pi_restart;
+		restart->futex.uaddr = (u32 *)uaddr;
+		restart->futex.val = val;
+		restart->futex.time = abs_time->tv64;
+		restart->futex.bitset = bitset;
+		restart->futex.flags = 0;
+		restart->futex.uaddr2 = (u32 *)uaddr2;
+		restart->futex.flags = FLAGS_HAS_TIMEOUT;
+
+		if (fshared)
+			restart->futex.flags |= FLAGS_SHARED;
+		if (clockrt)
+			restart->futex.flags |= FLAGS_CLOCKRT;
+
+		ret = -ERESTART_RESTARTBLOCK;
+
+		spin_unlock(&hb->lock);
+		goto out_put_keys;
+	}
+	spin_unlock(&hb->lock);
+
+	ret = 0;
+	/*
+	 * Check if the waker acquired the second futex for us. If the lock_ptr
+	 * is NULL, but our key is key2, then the requeue target futex was
+	 * uncontended and the waker gave it to us.  This is safe without a lock
+	 * as futex_requeue() will not release the hb lock until after it's
+	 * nulled the lock_ptr and removed us from the hb.
+	 */
+	if (!q.lock_ptr)
+		goto out_put_keys;
+
+	/*
+	 * At this point we have been requeued.  We have been woken up by
+	 * futex_unlock_pi(), a timeout, or a signal, but not futex_requeue().
+	 * futex_unlock_pi() will not destroy the lock_ptr nor the pi_state.
+	 */
+	WARN_ON(!&q.pi_state);
+	pi_mutex = &q.pi_state->pi_mutex;
+	ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
+	debug_rt_mutex_free_waiter(&waiter);
+
+	spin_lock(q.lock_ptr);
+	ret = finish_futex_lock_pi(uaddr, fshared, &q, ret);
+
+	/* Unqueue and drop the lock. */
+	unqueue_me_pi(&q);
+
+	/*
+	 * If fixup_pi_state_owner() faulted and was unable to handle the
+	 * fault, unlock it and return the fault to userspace.
+	 */
+	if (ret == -EFAULT) {
+		if (rt_mutex_owner(pi_mutex) == current)
+			rt_mutex_unlock(pi_mutex);
+	} else if (ret == -EINTR) {
+		if (get_user(uval, uaddr2)) {
+			ret = -EFAULT;
+			goto out_put_keys;
+		}
+
+		/*
+		 * We've already been requeued, so restart by calling
+		 * futex_lock_pi() directly, rather then returning to this
+		 * function.
+		 */
+		restart = &current_thread_info()->restart_block;
+		restart->fn = futex_lock_pi_restart;
+		restart->futex.uaddr = (u32 *)uaddr2;
+		restart->futex.val = uval;
+		restart->futex.flags = 0;
+		if (abs_time) {
+			restart->futex.flags |= FLAGS_HAS_TIMEOUT;
+			restart->futex.time = abs_time->tv64;
+		}
+
+		if (fshared)
+			restart->futex.flags |= FLAGS_SHARED;
+		if (clockrt)
+			restart->futex.flags |= FLAGS_CLOCKRT;
+		ret = -ERESTART_RESTARTBLOCK;
+	}
+
+out_put_keys:
+	put_futex_key(fshared, &q.key);
+	put_futex_key(fshared, &key2);
+
+out:
+	if (to) {
+		hrtimer_cancel(&to->timer);
+		destroy_hrtimer_on_stack(&to->timer);
+	}
+	return ret;
+}
+
+static long futex_wait_requeue_pi_restart(struct restart_block *restart)
+{
+	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
+	u32 __user *uaddr2 = (u32 __user *)restart->futex.uaddr2;
+	int fshared = restart->futex.flags & FLAGS_SHARED;
+	int clockrt = restart->futex.flags & FLAGS_CLOCKRT;
+	ktime_t t, *tp = NULL;
+
+	if (restart->futex.flags | FLAGS_HAS_TIMEOUT) {
+		t.tv64 = restart->futex.time;
+		tp = &t;
+	}
+	restart->fn = do_no_restart_syscall;
+
+	return (long)futex_wait_requeue_pi(uaddr, fshared, restart->futex.val,
+					   tp, restart->futex.bitset, clockrt,
+					   uaddr2);
+}
+
 /*
  * Support for robust futexes: the kernel cleans up held futexes at
  * thread exit time.
@@ -1994,7 +2474,7 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		fshared = 1;
 
 	clockrt = op & FUTEX_CLOCK_REALTIME;
-	if (clockrt && cmd != FUTEX_WAIT_BITSET)
+	if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
 		return -ENOSYS;
 
 	switch (cmd) {
@@ -2009,10 +2489,11 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		ret = futex_wake(uaddr, fshared, val, val3);
 		break;
 	case FUTEX_REQUEUE:
-		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL);
+		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0);
 		break;
 	case FUTEX_CMP_REQUEUE:
-		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3);
+		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
+				    0);
 		break;
 	case FUTEX_WAKE_OP:
 		ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
@@ -2029,6 +2510,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		if (futex_cmpxchg_enabled)
 			ret = futex_lock_pi(uaddr, fshared, 0, timeout, 1);
 		break;
+	case FUTEX_WAIT_REQUEUE_PI:
+		val3 = FUTEX_BITSET_MATCH_ANY;
+		ret = futex_wait_requeue_pi(uaddr, fshared, val, timeout, val3,
+					    clockrt, uaddr2);
+		break;
+	case FUTEX_REQUEUE_PI:
+		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 1);
+		break;
+	case FUTEX_CMP_REQUEUE_PI:
+		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
+				    1);
+		break;
 	default:
 		ret = -ENOSYS;
 	}
@@ -2046,7 +2539,8 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 	int cmd = op & FUTEX_CMD_MASK;
 
 	if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
-		      cmd == FUTEX_WAIT_BITSET)) {
+		      cmd == FUTEX_WAIT_BITSET ||
+		      cmd == FUTEX_WAIT_REQUEUE_PI)) {
 		if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
 			return -EFAULT;
 		if (!timespec_valid(&ts))
@@ -2058,10 +2552,11 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 		tp = &t;
 	}
 	/*
-	 * requeue parameter in 'utime' if cmd == FUTEX_REQUEUE.
+	 * requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*.
 	 * number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP.
 	 */
 	if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
+	    cmd == FUTEX_REQUEUE_PI || cmd == FUTEX_CMP_REQUEUE_PI ||
 	    cmd == FUTEX_WAKE_OP)
 		val2 = (u32) (unsigned long) utime;
 


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 6/8] RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags
  2009-03-30 21:38 ` [tip PATCH v6 6/8] RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags Darren Hart
@ 2009-03-30 21:40   ` Eric Dumazet
  2009-03-30 22:40     ` Darren Hart
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2009-03-30 21:40 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Thomas Gleixner, Sripathi Kodi, Peter Zijlstra,
	John Stultz, Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Ingo Molnar, Jakub Jelinek

Darren Hart a écrit :
> Currently restart is only used if there is a timeout.  The requeue_pi
> functionality requires restarting to futex_lock_pi() on signal after
> wakeup in futex_wait_requeue_pi() regardless of if there was a timeout
> or not.  Using 0 for the timeout value is confusing as that could
> indicate an expired timer.  The flag makes this explicit.
> 
> Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sripathi Kodi <sripathik@in.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: John Stultz <johnstul@us.ibm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Dinakar Guniguntala <dino@in.ibm.com>
> Cc: Ulrich Drepper <drepper@redhat.com>
> Cc: Eric Dumazet <dada1@cosmosbay.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Jakub Jelinek <jakub@redhat.com>
> ---
> 
>  kernel/futex.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index dd05803..f54faf5 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -1254,6 +1254,7 @@ handle_fault:
>   */
>  #define FLAGS_SHARED		0x01
>  #define FLAGS_CLOCKRT		0x02
> +#define FLAGS_HAS_TIMEOUT	0x04
>  
>  static long futex_wait_restart(struct restart_block *restart);
>  
> @@ -1493,7 +1494,7 @@ retry_private:
>  	restart->futex.val = val;
>  	restart->futex.time = abs_time->tv64;
>  	restart->futex.bitset = bitset;
> -	restart->futex.flags = 0;
> +	restart->futex.flags = FLAGS_HAS_TIMEOUT;
>  
>  	if (fshared)
>  		restart->futex.flags |= FLAGS_SHARED;
> @@ -1517,13 +1518,16 @@ static long futex_wait_restart(struct restart_block *restart)
>  {
>  	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
>  	int fshared = 0;
> -	ktime_t t;
> +	ktime_t t, *tp = NULL;
>  
> -	t.tv64 = restart->futex.time;
> +	if (restart->futex.flags | FLAGS_HAS_TIMEOUT) {

You probably meant :

if (restart->futex.flags & FLAGS_HAS_TIMEOUT)

> +		t.tv64 = restart->futex.time;
> +		tp = &t;
> +	}
>  	restart->fn = do_no_restart_syscall;
>  	if (restart->futex.flags & FLAGS_SHARED)
>  		fshared = 1;
> -	return (long)futex_wait(uaddr, fshared, restart->futex.val, &t,
> +	return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
>  				restart->futex.bitset,
>  				restart->futex.flags & FLAGS_CLOCKRT);
>  }
> 
> 



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls
  2009-03-30 21:38 ` [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls Darren Hart
@ 2009-03-30 21:47   ` Eric Dumazet
  2009-03-30 22:44     ` Darren Hart
  2009-03-31  7:30   ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Eric Dumazet @ 2009-03-30 21:47 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Thomas Gleixner, Sripathi Kodi, Peter Zijlstra,
	John Stultz, Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Ingo Molnar, Jakub Jelinek

Darren Hart a écrit :
> PI Futexes and their underlying rt_mutex cannot be left ownerless if there are
> pending waiters as this will break the PI boosting logic, so the standard
> requeue commands aren't sufficient.  The new commands properly manage pi futex
> ownership by ensuring a futex with waiters has an owner at all times.  This
> will allow glibc to properly handle pi mutexes with pthread_condvars.
> 
> The approach taken here is to create two new futex op codes:
> 
> FUTEX_WAIT_REQUEUE_PI:
> Tasks will use this op code to wait on a futex (such as a non-pi waitqueue)
> and wake after they have been requeued to a pi futex.  Prior to returning to
> userspace, they will acquire this pi futex (and the underlying rt_mutex).
> 
> futex_wait_requeue_pi() is the result of a high speed collision between
> futex_wait() and futex_lock_pi() (with the first part of futex_lock_pi() being
> done by futex_proxy_trylock_atomic() on behalf of the top_waiter).
> 
> FUTEX_REQUEUE_PI (and FUTEX_CMP_REQUEUE_PI):
> This call must be used to wake tasks waiting with FUTEX_WAIT_REQUEUE_PI,
> regardless of how many tasks the caller intends to wake or requeue.
> pthread_cond_broadcast() should call this with nr_wake=1 and
> nr_requeue=INT_MAX.  pthread_cond_signal() should call this with nr_wake=1 and
> nr_requeue=0.  The reason being we need both callers to get the benefit of the
> futex_proxy_trylock_atomic() routine.  futex_requeue() also enqueues the
> top_waiter on the rt_mutex via rt_mutex_start_proxy_lock().
> 
> Changelog:
> V6: -Moved non requeue_pi related fixes/changes into separate patches
>     -Make use of new double_unlock_hb()
>     -Futex key management updates
>     -Removed unnecessary futex_requeue_pi_cleanup() routine
>     -Return -EINVAL if futex_wake is called with q.rt_waiter != NULL
>     -Rewrote futex_wait_requeue_pi() wakeup logic
>     -Rewrote requeue/wakeup loop
>     -Renamed futex_requeue_pi_init() to futex_proxy_trylock_atomic()
>     -Handle third party owner, removed -EMORON :-(
>     -Comment updates
> V5: -Update futex_requeue to allow for nr_requeue == 0
>     -Whitespace cleanup
>     -Added task_count var to futex_requeue to avoid confusion between
>      ret, res, and ret used to count wakes and requeues
> V4: -Cleanups to pass checkpatch.pl
>     -Added missing goto out; in futex_wait_requeue_pi()
>     -Moved rt_mutex_handle_wakeup to the rt_mutex_enqueue_task patch as they
>      are a functional pair.
>     -Fixed several error exit paths that failed to unqueue the futex_q, which
>      not only would leave the futex_q on the hb, but would have caused an exit
>      race with the waiter since they weren't synchonized on the hb lock.  Thanks
>      Sripathi for catching this.
>     -Fix pi_state handling in futex_requeue
>     -Several other minor fixes to futex_requeue_pi
>     -add requeue_futex function and force the requeue in requeue_pi even for the
>      task we wake in the requeue loop
>     -refill the pi state cache at the beginning of futex_requeue for requeue_pi
>     -have futex_requeue_pi_init ensure it stores off the pi_state for use in
>      futex_requeue
>     - Delayed starting the hrtimer until after TASK_INTERRUPTIBLE is set
>     - Fixed NULL pointer bug when futex_wait_requeue_pi() has no timer and
>       receives a signal after waking on uaddr2.  Added has_timeout to the
>       restart->futex structure.
> V3: -Added FUTEX_CMP_REQUEUE_PI op
>     -Put fshared support back.  So long as it is encoded in the op code, we
>      assume both the uaddr's are either private or share, but not mixed.
>     -Fixed access to expected value of uaddr2 in futex_wait_requeue_pi()
> V2: -Added rt_mutex enqueueing to futex_requeue_pi_init
>     -Updated fault handling and exit logic
> V1: -Initial verion
> 
> Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sripathi Kodi <sripathik@in.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: John Stultz <johnstul@us.ibm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Dinakar Guniguntala <dino@in.ibm.com>
> Cc: Ulrich Drepper <drepper@redhat.com>
> Cc: Eric Dumazet <dada1@cosmosbay.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Jakub Jelinek <jakub@redhat.com>
> ---
> 
>  include/linux/futex.h       |    8 +
>  include/linux/thread_info.h |    3 
>  kernel/futex.c              |  533 +++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 524 insertions(+), 20 deletions(-)
> 
> diff --git a/include/linux/futex.h b/include/linux/futex.h
> index 3bf5bb5..b05519c 100644
> --- a/include/linux/futex.h
> +++ b/include/linux/futex.h
> +static long futex_lock_pi_restart(struct restart_block *restart)
> +{
> +	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
> +	ktime_t t, *tp = NULL;
> +	int fshared = restart->futex.flags & FLAGS_SHARED;
> +
> +	if (restart->futex.flags | FLAGS_HAS_TIMEOUT) {

if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {

> +		t.tv64 = restart->futex.time;
> +		tp = &t;
> +	}
> +	restart->fn = do_no_restart_syscall;
> +
> +	return (long)futex_lock_pi(uaddr, fshared, restart->futex.val, tp, 0);
> +}
>  
>  /*
>   * Userspace attempted a TID -> 0 atomic transition, and failed.
> @@ -1772,6 +1968,290 @@ pi_faulted:
>  	return ret;
>  }
>  

> +
> +static long futex_wait_requeue_pi_restart(struct restart_block *restart)
> +{
> +	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
> +	u32 __user *uaddr2 = (u32 __user *)restart->futex.uaddr2;
> +	int fshared = restart->futex.flags & FLAGS_SHARED;
> +	int clockrt = restart->futex.flags & FLAGS_CLOCKRT;
> +	ktime_t t, *tp = NULL;
> +
> +	if (restart->futex.flags | FLAGS_HAS_TIMEOUT) {

if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {

> +		t.tv64 = restart->futex.time;
> +		tp = &t;
> +	}
> +	restart->fn = do_no_restart_syscall;
> +


Strange your compiler dit not complains...


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re:  glibc hacks for requeue_pi
  2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
                   ` (7 preceding siblings ...)
  2009-03-30 21:38 ` [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls Darren Hart
@ 2009-03-30 21:55 ` Darren Hart
  2009-03-31  2:09 ` [tip PATCH v6 0/8] requeue pi implementation Steven Rostedt
  9 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-30 21:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 239 bytes --]

The attached patch is what we used to test glibc with requeue_pi.
As a prototype we deleted the assembly optimizations and just updated
the .c implementation.  Credit for the prototype goes to Sripathi Kodi.

Thanks,

--
Darren Hart







[-- Attachment #2: glibc_requeue_pi_prototype_v3.patch --]
[-- Type: text/x-diff, Size: 40397 bytes --]

diff -uprN glibc-2.5-20061008T1257_org/nptl/sysdeps/pthread/pthread_cond_broadcast.c glibc-2.5-20061008T1257/nptl/sysdeps/pthread/pthread_cond_broadcast.c
--- glibc-2.5-20061008T1257_org/nptl/sysdeps/pthread/pthread_cond_broadcast.c	2009-01-19 13:52:14.000000000 +0530
+++ glibc-2.5-20061008T1257/nptl/sysdeps/pthread/pthread_cond_broadcast.c	2009-03-26 14:38:53.000000000 +0530
@@ -27,11 +27,39 @@
 #include <shlib-compat.h>
 #include <kernel-features.h>
 
+#define FUTEX_CMP_REQUEUE 4
+#define FUTEX_REQUEUE_PI 12
+#define FUTEX_CMP_REQUEUE_PI   13
+
+/* Returns non-zero if error happened, zero if success.  */
+#define lll_futex_requeue(futexp, nr_wake, nr_move, mutex, val) \
+  ({                                                                          \
+    INTERNAL_SYSCALL_DECL (__err);                                            \
+    long int __ret;                                                           \
+    __ret = INTERNAL_SYSCALL (futex, __err, 6,                                \
+                              (futexp), FUTEX_CMP_REQUEUE, (nr_wake),         \
+                              (nr_move), (mutex), (val));                     \
+    INTERNAL_SYSCALL_ERROR_P (__ret, __err);                                  \
+  })
+
+
+/* Returns non-zero if error happened, zero if success.  */
+#define lll_futex_requeue_pi(futexp, nr_wake, nr_move, mutex, val) \
+  ({                                                                          \
+    INTERNAL_SYSCALL_DECL (__err);                                            \
+    long int __ret;                                                           \
+    __ret = INTERNAL_SYSCALL (futex, __err, 6,                                \
+                              (futexp), FUTEX_CMP_REQUEUE_PI, (nr_wake),         \
+                              (nr_move), (mutex), (val));                     \
+    INTERNAL_SYSCALL_ERROR_P (__ret, __err);                                  \
+  })
 
 int
 __pthread_cond_broadcast (cond)
      pthread_cond_t *cond;
 {
+  int err;
+
   /* Make sure we are alone.  */
   lll_mutex_lock (cond->__data.__lock);
 
@@ -57,9 +85,25 @@ __pthread_cond_broadcast (cond)
       pthread_mutex_t *mut = (pthread_mutex_t *) cond->__data.__mutex;
 
       /* XXX: Kernel so far doesn't support requeue to PI futex.  */
-      if (__builtin_expect (mut->__data.__kind & PTHREAD_MUTEX_PRIO_INHERIT_NP,
+/*      if (__builtin_expect (mut->__data.__kind & PTHREAD_MUTEX_PRIO_INHERIT_NP,
 			    0))
 	goto wake_all;
+*/
+	/* requeue_pi */
+	/* This code assumes  that lll_futex_requeue_pi returns 0 on success */
+      if (mut->__data.__kind & PTHREAD_MUTEX_PRIO_INHERIT_NP)
+	{
+	  err = lll_futex_requeue_pi (&cond->__data.__futex, 1,  INT_MAX,
+				       &mut->__data.__lock,  futex_val);
+	  /* FIXME: We now return any error other than ENOSYS to application */
+	  if (!err)
+	    return 0;
+	  else if (err != ENOSYS)
+	    return err;
+	  else /* It is ENOSYS */
+	    goto wake_all;
+	}
+
 
       /* lll_futex_requeue returns 0 for success and non-zero
 	 for errors.  */
diff -uprN glibc-2.5-20061008T1257_org/nptl/sysdeps/pthread/pthread_cond_signal.c glibc-2.5-20061008T1257/nptl/sysdeps/pthread/pthread_cond_signal.c
--- glibc-2.5-20061008T1257_org/nptl/sysdeps/pthread/pthread_cond_signal.c	2009-01-19 13:52:14.000000000 +0530
+++ glibc-2.5-20061008T1257/nptl/sysdeps/pthread/pthread_cond_signal.c	2009-03-26 14:43:08.000000000 +0530
@@ -27,11 +27,48 @@
 #include <shlib-compat.h>
 #include <kernel-features.h>
 
+#define FUTEX_OP_CLEAR_WAKE_IF_GT_ONE ((4 << 24) | 1)
+#define FUTEX_WAKE_OP           5
+
+#define FUTEX_REQUEUE_PI 12
+#define FUTEX_CMP_REQUEUE_PI   13
+
+/* Returns non-zero if error happened, zero if success.  */
+#define lll_futex_requeue_pi(futexp, nr_wake, nr_move, mutex, val) \
+  ({                                                                          \
+    INTERNAL_SYSCALL_DECL (__err);                                            \
+    long int __ret;                                                           \
+    __ret = INTERNAL_SYSCALL (futex, __err, 6,                                \
+                              (futexp), FUTEX_REQUEUE_PI, (nr_wake),         \
+                              (nr_move), (mutex), (val));                     \
+    INTERNAL_SYSCALL_ERROR_P (__ret, __err);                                  \
+  })
+
+
+/* Returns non-zero if error happened, zero if success.  */
+#define lll_futex_wake_unlock(futexp, nr_wake, nr_wake2, futexp2) \
+  ({                                                                          \
+    INTERNAL_SYSCALL_DECL (__err);                                            \
+    long int __ret;                                                           \
+                                                                              \
+    __ret = INTERNAL_SYSCALL (futex, __err, 6,                                \
+                              (futexp), FUTEX_WAKE_OP, (nr_wake),             \
+                              (nr_wake2), (futexp2),                          \
+                              FUTEX_OP_CLEAR_WAKE_IF_GT_ONE);                 \
+    INTERNAL_SYSCALL_ERROR_P (__ret, __err);                                  \
+  })
+
 
 int
 __pthread_cond_signal (cond)
      pthread_cond_t *cond;
 {
+  int err;
+
+  /* Needed for requeue_pi.  */
+  pthread_mutex_t *mut = (pthread_mutex_t *) cond->__data.__mutex;
+  int futex_val = cond->__data.__futex;
+
   /* Make sure we are alone.  */
   lll_mutex_lock (cond->__data.__lock);
 
@@ -42,6 +79,21 @@ __pthread_cond_signal (cond)
       ++cond->__data.__wakeup_seq;
       ++cond->__data.__futex;
 
+      /* requeue_pi: requeue 1 waiter */
+      /* This code assumes  that lll_futex_requeue_pi returns 0 on success */
+      if (mut->__data.__kind & PTHREAD_MUTEX_PRIO_INHERIT_NP)
+        { 
+	  err = lll_futex_requeue_pi (&cond->__data.__futex, 1, 0, &mut->__data.__lock,
+				       futex_val);
+	  /* FIXME: We now return any error other than ENOSYS to application 
+	   * In case of ENOSYS we fall through this code to call lll_futex_wake_unlock*/
+	  if (!err)
+	    goto requeued;
+	  else if (err != ENOSYS)
+	    return err;
+        }
+
+
       /* Wake one.  */
       if (! __builtin_expect (lll_futex_wake_unlock (&cond->__data.__futex, 1,
 						     1, &cond->__data.__lock),
@@ -51,6 +103,8 @@ __pthread_cond_signal (cond)
       lll_futex_wake (&cond->__data.__futex, 1);
     }
 
+requeued:
+
   /* We are done.  */
   lll_mutex_unlock (cond->__data.__lock);
 
diff -uprN glibc-2.5-20061008T1257_org/nptl/sysdeps/pthread/pthread_cond_timedwait.c glibc-2.5-20061008T1257/nptl/sysdeps/pthread/pthread_cond_timedwait.c
--- glibc-2.5-20061008T1257_org/nptl/sysdeps/pthread/pthread_cond_timedwait.c	2009-01-19 13:52:14.000000000 +0530
+++ glibc-2.5-20061008T1257/nptl/sysdeps/pthread/pthread_cond_timedwait.c	2009-03-26 14:43:08.000000000 +0530
@@ -39,6 +39,18 @@ struct _condvar_cleanup_buffer
   unsigned int bc_seq;
 };
 
+#define FUTEX_WAIT_REQUEUE_PI 11
+
+#define lll_futex_timed_wait_requeue_pi(futexp, val, time, mutex) 	      \
+  ({                                                                          \
+    INTERNAL_SYSCALL_DECL (__err);                                            \
+    long int __ret;                                                           \
+    __ret = INTERNAL_SYSCALL (futex, __err, 5,                                \
+                              (futexp), FUTEX_WAIT_REQUEUE_PI, (val), time, (mutex)); \
+    INTERNAL_SYSCALL_ERROR_P (__ret, __err)? -__ret : __ret;                  \
+  })
+
+
 int
 __pthread_cond_timedwait (cond, mutex, abstime)
      pthread_cond_t *cond;
@@ -151,9 +163,18 @@ __pthread_cond_timedwait (cond, mutex, a
       /* Enable asynchronous cancellation.  Required by the standard.  */
       cbuffer.oldtype = __pthread_enable_asynccancel ();
 
-      /* Wait until woken by signal or broadcast.  */
-      err = lll_futex_timed_wait (&cond->__data.__futex,
+      /* requeue_pi */
+      if (mutex->__data.__kind & PTHREAD_MUTEX_PRIO_INHERIT_NP)
+	{
+	  err = lll_futex_timed_wait_requeue_pi (&cond->__data.__futex,
+						 futex_val, &rt, &mutex->__data.__lock);
+	} else
+	{
+
+	  /* Wait until woken by signal or broadcast.  */
+	  err = lll_futex_timed_wait (&cond->__data.__futex,
 				  futex_val, &rt);
+	}
 
       /* Disable asynchronous cancellation.  */
       __pthread_disable_asynccancel (cbuffer.oldtype);
@@ -205,7 +226,11 @@ __pthread_cond_timedwait (cond, mutex, a
   __pthread_cleanup_pop (&buffer, 0);
 
   /* Get the mutex before returning.  */
-  err = __pthread_mutex_cond_lock (mutex);
+  /* With requeue_pi, the lock is held in the kernel, so just return 0 to application */
+  if (mutex->__data.__kind & PTHREAD_MUTEX_PRIO_INHERIT_NP)
+    return 0;
+  else
+    err = __pthread_mutex_cond_lock (mutex);
 
   return err ?: result;
 }
diff -uprN glibc-2.5-20061008T1257_org/nptl/sysdeps/pthread/pthread_cond_wait.c glibc-2.5-20061008T1257/nptl/sysdeps/pthread/pthread_cond_wait.c
--- glibc-2.5-20061008T1257_org/nptl/sysdeps/pthread/pthread_cond_wait.c	2009-01-19 13:52:14.000000000 +0530
+++ glibc-2.5-20061008T1257/nptl/sysdeps/pthread/pthread_cond_wait.c	2009-03-26 14:43:09.000000000 +0530
@@ -87,6 +87,17 @@ __condvar_cleanup (void *arg)
   __pthread_mutex_cond_lock (cbuffer->mutex);
 }
 
+#define FUTEX_WAIT_REQUEUE_PI 11
+
+#define lll_futex_wait_requeue_pi(futexp, val, mutex) \
+  ({                                                                          \
+    INTERNAL_SYSCALL_DECL (__err);                                            \
+    long int __ret;                                                           \
+    __ret = INTERNAL_SYSCALL (futex, __err, 5,                                \
+                              (futexp), FUTEX_WAIT_REQUEUE_PI, (val), 0, (mutex)); \
+    INTERNAL_SYSCALL_ERROR_P (__ret, __err)? -__ret : __ret;                  \
+  })
+
 
 int
 __pthread_cond_wait (cond, mutex)
@@ -145,9 +156,25 @@ __pthread_cond_wait (cond, mutex)
       /* Enable asynchronous cancellation.  Required by the standard.  */
       cbuffer.oldtype = __pthread_enable_asynccancel ();
 
+      /* requeue_pi */
+      if (mutex->__data.__kind & PTHREAD_MUTEX_PRIO_INHERIT_NP)
+        {
+	  err = lll_futex_wait_requeue_pi (&cond->__data.__futex,
+					futex_val, &mutex->__data.__lock);
+	    /* If error is ENOSYS, we fall through this code to call lll_futex_wait.
+	     * Return any other error to application, for now 
+	     * FIXME: Find out what other errors can come from the kernel */
+	    if (!err)
+	      goto requeued;
+	    else if (err != ENOSYS)
+	      return err;
+        }
+
       /* Wait until woken by signal or broadcast.  */
       lll_futex_wait (&cond->__data.__futex, futex_val);
 
+requeued:
+
       /* Disable asynchronous cancellation.  */
       __pthread_disable_asynccancel (cbuffer.oldtype);
 
@@ -173,6 +200,10 @@ __pthread_cond_wait (cond, mutex)
   /* If pthread_cond_destroy was called on this varaible already,
      notify the pthread_cond_destroy caller all waiters have left
      and it can be successfully destroyed.  */
+
+  /* FIXME: With requeue_pi, the below check won't work properly. Hence test cases
+   * that call cond_destroy may not work correctly */
+
   if (cond->__data.__total_seq == -1ULL
       && cond->__data.__nwaiters < (1 << COND_CLOCK_BITS))
     lll_futex_wake (&cond->__data.__nwaiters, 1);
@@ -184,7 +215,11 @@ __pthread_cond_wait (cond, mutex)
   __pthread_cleanup_pop (&buffer, 0);
 
   /* Get the mutex before returning.  */
-  return __pthread_mutex_cond_lock (mutex);
+  /* With requeue_pi, the lock is held in the kernel, so just return 0 to application */
+  if (mutex->__data.__kind & PTHREAD_MUTEX_PRIO_INHERIT_NP)
+    return 0;
+  else
+    return __pthread_mutex_cond_lock (mutex);
 }
 
 versioned_symbol (libpthread, __pthread_cond_wait, pthread_cond_wait,
diff -uprN glibc-2.5-20061008T1257_org/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S glibc-2.5-20061008T1257/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S
--- glibc-2.5-20061008T1257_org/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S	2009-01-19 13:52:14.000000000 +0530
+++ glibc-2.5-20061008T1257/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S	1970-01-01 05:30:00.000000000 +0530
@@ -1,143 +0,0 @@
-/* Copyright (C) 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
-   02111-1307 USA.  */
-
-#include <sysdep.h>
-#include <shlib-compat.h>
-#include <lowlevelcond.h>
-#include <kernel-features.h>
-#include <pthread-pi-defines.h>
-
-#ifdef UP
-# define LOCK
-#else
-# define LOCK lock
-#endif
-
-#define SYS_futex		202
-#define FUTEX_WAIT		0
-#define FUTEX_WAKE		1
-#define FUTEX_REQUEUE		3
-#define FUTEX_CMP_REQUEUE	4
-
-#define EINVAL			22
-
-
-	.text
-
-	/* int pthread_cond_broadcast (pthread_cond_t *cond) */
-	.globl	__pthread_cond_broadcast
-	.type	__pthread_cond_broadcast, @function
-	.align	16
-__pthread_cond_broadcast:
-
-	/* Get internal lock.  */
-	movl	$1, %esi
-	xorl	%eax, %eax
-	LOCK
-#if cond_lock == 0
-	cmpxchgl %esi, (%rdi)
-#else
-	cmpxchgl %esi, cond_lock(%rdi)
-#endif
-	jnz	1f
-
-2:	addq	$cond_futex, %rdi
-	movq	total_seq-cond_futex(%rdi), %r9
-	cmpq	wakeup_seq-cond_futex(%rdi), %r9
-	jna	4f
-
-	/* Cause all currently waiting threads to recognize they are
-	   woken up.  */
-	movq	%r9, wakeup_seq-cond_futex(%rdi)
-	movq	%r9, woken_seq-cond_futex(%rdi)
-	addq	%r9, %r9
-	movl	%r9d, (%rdi)
-	incl	broadcast_seq-cond_futex(%rdi)
-
-	/* Get the address of the mutex used.  */
-	movq	dep_mutex-cond_futex(%rdi), %r8
-
-	/* Unlock.  */
-	LOCK
-	decl	cond_lock-cond_futex(%rdi)
-	jne	7f
-
-8:	cmpq	$-1, %r8
-	je	9f
-
-	/* XXX: The kernel so far doesn't support requeue to PI futex.  */
-	testl	$PI_BIT, MUTEX_KIND(%r8)
-	jne	9f
-
-	/* Wake up all threads.  */
-	movl	$FUTEX_CMP_REQUEUE, %esi
-	movl	$SYS_futex, %eax
-	movl	$1, %edx
-	movl	$0x7fffffff, %r10d
-	syscall
-
-	/* For any kind of error, which mainly is EAGAIN, we try again
-	   with WAKE.  The general test also covers running on old
-	   kernels.  */
-	cmpq	$-4095, %rax
-	jae	9f
-
-10:	xorl	%eax, %eax
-	retq
-
-	.align	16
-	/* Unlock.  */
-4:	LOCK
-	decl	cond_lock-cond_futex(%rdi)
-	jne	5f
-
-6:	xorl	%eax, %eax
-	retq
-
-	/* Initial locking failed.  */
-1:
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_lock_wait
-#if cond_lock != 0
-	subq	$cond_lock, %rdi
-#endif
-	jmp	2b
-
-	/* Unlock in loop requires wakeup.  */
-5:	addq	$cond_lock-cond_futex, %rdi
-	callq	__lll_mutex_unlock_wake
-	jmp	6b
-
-	/* Unlock in loop requires wakeup.  */
-7:	addq	$cond_lock-cond_futex, %rdi
-	callq	__lll_mutex_unlock_wake
-	subq	$cond_lock-cond_futex, %rdi
-	jmp	8b
-
-9:	/* The futex requeue functionality is not available.  */
-	movl	$0x7fffffff, %edx
-	movl	$FUTEX_WAKE, %esi
-	movl	$SYS_futex, %eax
-	syscall
-	jmp	10b
-	.size	__pthread_cond_broadcast, .-__pthread_cond_broadcast
-versioned_symbol (libpthread, __pthread_cond_broadcast, pthread_cond_broadcast,
-		  GLIBC_2_3_2)
diff -uprN glibc-2.5-20061008T1257_org/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S glibc-2.5-20061008T1257/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S
--- glibc-2.5-20061008T1257_org/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S	2009-01-19 13:52:14.000000000 +0530
+++ glibc-2.5-20061008T1257/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S	1970-01-01 05:30:00.000000000 +0530
@@ -1,127 +0,0 @@
-/* Copyright (C) 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
-   02111-1307 USA.  */
-
-#include <sysdep.h>
-#include <shlib-compat.h>
-#include <lowlevelcond.h>
-#include <kernel-features.h>
-
-#ifdef UP
-# define LOCK
-#else
-# define LOCK lock
-#endif
-
-#define SYS_futex		202
-#define FUTEX_WAIT		0
-#define FUTEX_WAKE		1
-#define FUTEX_WAKE_OP		5
-
-#define FUTEX_OP_CLEAR_WAKE_IF_GT_ONE	((4 << 24) | 1)
-
-#define EINVAL			22
-
-
-	.text
-
-	/* int pthread_cond_signal (pthread_cond_t *cond) */
-	.globl	__pthread_cond_signal
-	.type	__pthread_cond_signal, @function
-	.align	16
-__pthread_cond_signal:
-
-	/* Get internal lock.  */
-	movq	%rdi, %r8
-	movl	$1, %esi
-	xorl	%eax, %eax
-	LOCK
-#if cond_lock == 0
-	cmpxchgl %esi, (%rdi)
-#else
-	cmpxchgl %esi, cond_lock(%rdi)
-#endif
-	jnz	1f
-
-2:	addq	$cond_futex, %rdi
-	movq	total_seq(%r8), %rcx
-	cmpq	wakeup_seq(%r8), %rcx
-	jbe	4f
-
-	/* Bump the wakeup number.  */
-	addq	$1, wakeup_seq(%r8)
-	addl	$1, (%rdi)
-
-	/* Wake up one thread.  */
-	movl	$FUTEX_WAKE_OP, %esi
-	movl	$SYS_futex, %eax
-	movl	$1, %edx
-	movl	$1, %r10d
-#if cond_lock != 0
-	addq	$cond_lock, %r8
-#endif
-	movl	$FUTEX_OP_CLEAR_WAKE_IF_GT_ONE, %r9d
-	syscall
-#if cond_lock != 0
-	subq	$cond_lock, %r8
-#endif
-	/* For any kind of error, we try again with WAKE.
-	   The general test also covers running on old kernels.  */
-	cmpq	$-4095, %rax
-	jae	7f
-
-	xorl	%eax, %eax
-	retq
-
-7:	movl	$FUTEX_WAKE, %esi
-	movl	$SYS_futex, %eax
-	/* %rdx should be 1 already from $FUTEX_WAKE_OP syscall.
-	movl	$1, %edx  */
-	syscall
-
-	/* Unlock.  */
-4:	LOCK
-#if cond_lock == 0
-	decl	(%r8)
-#else
-	decl	cond_lock(%r8)
-#endif
-	jne	5f
-
-6:	xorl	%eax, %eax
-	retq
-
-	/* Initial locking failed.  */
-1:
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_lock_wait
-#if cond_lock != 0
-	subq	$cond_lock, %rdi
-#endif
-	jmp	2b
-
-	/* Unlock in loop requires wakeup.  */
-5:
-	movq	%r8, %rdi
-	callq	__lll_mutex_unlock_wake
-	jmp	6b
-	.size	__pthread_cond_signal, .-__pthread_cond_signal
-versioned_symbol (libpthread, __pthread_cond_signal, pthread_cond_signal,
-		  GLIBC_2_3_2)
diff -uprN glibc-2.5-20061008T1257_org/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S glibc-2.5-20061008T1257/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S
--- glibc-2.5-20061008T1257_org/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S	2009-01-19 13:52:14.000000000 +0530
+++ glibc-2.5-20061008T1257/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S	1970-01-01 05:30:00.000000000 +0530
@@ -1,470 +0,0 @@
-/* Copyright (C) 2002, 2003, 2004, 2005 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
-   02111-1307 USA.  */
-
-#include <sysdep.h>
-#include <shlib-compat.h>
-#include <lowlevelcond.h>
-#include <pthread-errnos.h>
-
-#ifdef UP
-# define LOCK
-#else
-# define LOCK lock
-#endif
-
-#define SYS_futex		202
-#define FUTEX_WAIT		0
-#define FUTEX_WAKE		1
-
-/* For the calculation see asm/vsyscall.h.  */
-#define VSYSCALL_ADDR_vgettimeofday	0xffffffffff600000
-
-
-	.text
-
-/* int pthread_cond_timedwait (pthread_cond_t *cond, pthread_mutex_t *mutex,
-			       const struct timespec *abstime)  */
-	.globl	__pthread_cond_timedwait
-	.type	__pthread_cond_timedwait, @function
-	.align	16
-__pthread_cond_timedwait:
-.LSTARTCODE:
-	pushq	%r12
-.Lpush_r12:
-	pushq	%r13
-.Lpush_r13:
-	pushq	%r14
-.Lpush_r14:
-#define FRAME_SIZE 80
-	subq	$FRAME_SIZE, %rsp
-.Lsubq:
-
-	cmpq	$1000000000, 8(%rdx)
-	movl	$EINVAL, %eax
-	jae	18f
-
-	/* Stack frame:
-
-	   rsp + 80
-	            +--------------------------+
-	   rsp + 48 | cleanup buffer           |
-	            +--------------------------+
-	   rsp + 40 | old wake_seq value       |
-	            +--------------------------+
-	   rsp + 24 | timeout value            |
-	            +--------------------------+
-	   rsp + 16 | mutex pointer            |
-	            +--------------------------+
-	   rsp +  8 | condvar pointer          |
-	            +--------------------------+
-	   rsp +  4 | old broadcast_seq value  |
-	            +--------------------------+
-	   rsp +  0 | old cancellation mode    |
-	            +--------------------------+
-	*/
-
-	cmpq	$-1, dep_mutex(%rdi)
-
-	/* Prepare structure passed to cancellation handler.  */
-	movq	%rdi, 8(%rsp)
-	movq	%rsi, 16(%rsp)
-	movq	%rdx, %r13
-
-	je	22f
-	movq	%rsi, dep_mutex(%rdi)
-
-	/* Get internal lock.  */
-22:	movl	$1, %esi
-	xorl	%eax, %eax
-	LOCK
-#if cond_lock == 0
-	cmpxchgl %esi, (%rdi)
-#else
-	cmpxchgl %esi, cond_lock(%rdi)
-#endif
-	jnz	1f
-
-	/* Unlock the mutex.  */
-2:	movq	16(%rsp), %rdi
-	xorl	%esi, %esi
-	callq	__pthread_mutex_unlock_usercnt
-
-	testl	%eax, %eax
-	jne	16f
-
-	movq	8(%rsp), %rdi
-	incq	total_seq(%rdi)
-	incl	cond_futex(%rdi)
-	addl	$(1 << clock_bits), cond_nwaiters(%rdi)
-
-	/* Install cancellation handler.  */
-#ifdef PIC
-	leaq	__condvar_cleanup(%rip), %rsi
-#else
-	leaq	__condvar_cleanup, %rsi
-#endif
-	leaq	48(%rsp), %rdi
-	movq	%rsp, %rdx
-	callq	__pthread_cleanup_push
-
-	/* Get and store current wakeup_seq value.  */
-	movq	8(%rsp), %rdi
-	movq	wakeup_seq(%rdi), %r9
-	movl	broadcast_seq(%rdi), %edx
-	movq	%r9, 40(%rsp)
-	movl	%edx, 4(%rsp)
-
-	/* Get the current time.  */
-8:
-#ifdef __NR_clock_gettime
-	/* Get the clock number.  Note that the field in the condvar
-	   structure stores the number minus 1.  */
-	movq	8(%rsp), %rdi
-	movl	cond_nwaiters(%rdi), %edi
-	andl	$((1 << clock_bits) - 1), %edi
-	/* Only clocks 0 and 1 are allowed so far.  Both are handled in the
-	   kernel.  */
-	leaq	24(%rsp), %rsi
-	movl	$__NR_clock_gettime, %eax
-	syscall
-# ifndef __ASSUME_POSIX_TIMERS
-	cmpq	$-ENOSYS, %rax
-	je	19f
-# endif
-
-	/* Compute relative timeout.  */
-	movq	(%r13), %rcx
-	movq	8(%r13), %rdx
-	subq	24(%rsp), %rcx
-	subq	32(%rsp), %rdx
-#else
-	leaq	24(%rsp), %rdi
-	xorl	%esi, %esi
-	movq	$VSYSCALL_ADDR_vgettimeofday, %rax
-	callq	*%rax
-
-	/* Compute relative timeout.  */
-	movq	32(%rsp), %rax
-	movl	$1000, %edx
-	mul	%rdx		/* Milli seconds to nano seconds.  */
-	movq	(%r13), %rcx
-	movq	8(%r13), %rdx
-	subq	24(%rsp), %rcx
-	subq	%rax, %rdx
-#endif
-	jns	12f
-	addq	$1000000000, %rdx
-	decq	%rcx
-12:	testq	%rcx, %rcx
-	movq	8(%rsp), %rdi
-	movq	$-ETIMEDOUT, %r14
-	js	6f
-
-	/* Store relative timeout.  */
-21:	movq	%rcx, 24(%rsp)
-	movq	%rdx, 32(%rsp)
-
-	movl	cond_futex(%rdi), %r12d
-
-	/* Unlock.  */
-	LOCK
-#if cond_lock == 0
-	decl	(%rdi)
-#else
-	decl	cond_lock(%rdi)
-#endif
-	jne	3f
-
-4:	callq	__pthread_enable_asynccancel
-	movl	%eax, (%rsp)
-
-	leaq	24(%rsp), %r10
-#if FUTEX_WAIT == 0
-	xorl	%esi, %esi
-#else
-	movl	$FUTEX_WAIT, %esi
-#endif
-	movq	%r12, %rdx
-	addq	$cond_futex, %rdi
-	movl	$SYS_futex, %eax
-	syscall
-	movq	%rax, %r14
-
-	movl	(%rsp), %edi
-	callq	__pthread_disable_asynccancel
-
-	/* Lock.  */
-	movq	8(%rsp), %rdi
-	movl	$1, %esi
-	xorl	%eax, %eax
-	LOCK
-#if cond_lock == 0
-	cmpxchgl %esi, (%rdi)
-#else
-	cmpxchgl %esi, cond_lock(%rdi)
-#endif
-	jne	5f
-
-6:	movl	broadcast_seq(%rdi), %edx
-
-	movq	woken_seq(%rdi), %rax
-
-	movq	wakeup_seq(%rdi), %r9
-
-	cmpl	4(%rsp), %edx
-	jne	23f
-
-	cmpq	40(%rsp), %r9
-	jbe	15f
-
-	cmpq	%rax, %r9
-	ja	9f
-
-15:	cmpq	$-ETIMEDOUT, %r14
-	jne	8b
-
-13:	incq	wakeup_seq(%rdi)
-	incl	cond_futex(%rdi)
-	movl	$ETIMEDOUT, %r14d
-	jmp	14f
-
-23:	xorq	%r14, %r14
-	jmp	24f
-
-9:	xorq	%r14, %r14
-14:	incq	woken_seq(%rdi)
-
-24:	subl	$(1 << clock_bits), cond_nwaiters(%rdi)
-
-	/* Wake up a thread which wants to destroy the condvar object.  */
-	cmpq	$0xffffffffffffffff, total_seq(%rdi)
-	jne	25f
-	movl	cond_nwaiters(%rdi), %eax
-	andl	$~((1 << clock_bits) - 1), %eax
-	jne	25f
-
-	addq	$cond_nwaiters, %rdi
-	movl	$SYS_futex, %eax
-	movl	$FUTEX_WAKE, %esi
-	movl	$1, %edx
-	syscall
-	subq	$cond_nwaiters, %rdi
-
-25:	LOCK
-#if cond_lock == 0
-	decl	(%rdi)
-#else
-	decl	cond_lock(%rdi)
-#endif
-	jne	10f
-
-	/* Remove cancellation handler.  */
-11:	movq	48+CLEANUP_PREV(%rsp), %rdx
-	movq	%rdx, %fs:CLEANUP
-
-	movq	16(%rsp), %rdi
-	callq	__pthread_mutex_cond_lock
-
-	testq	%rax, %rax
-	cmoveq	%r14, %rax
-
-18:	addq	$FRAME_SIZE, %rsp
-.Laddq:
-	popq	%r14
-.Lpop_r14:
-	popq	%r13
-.Lpop_r13:
-	popq	%r12
-.Lpop_r12:
-
-	retq
-
-	/* Initial locking failed.  */
-1:
-.LSbl1:
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_lock_wait
-	jmp	2b
-
-	/* Unlock in loop requires wakeup.  */
-3:
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_unlock_wake
-	jmp	4b
-
-	/* Locking in loop failed.  */
-5:
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_lock_wait
-#if cond_lock != 0
-	subq	$cond_lock, %rdi
-#endif
-	jmp	6b
-
-	/* Unlock after loop requires wakeup.  */
-10:
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_unlock_wake
-	jmp	11b
-
-	/* The initial unlocking of the mutex failed.  */
-16:	movq	8(%rsp), %rdi
-	movq	%rax, (%rsp)
-	LOCK
-#if cond_lock == 0
-	decl	(%rdi)
-#else
-	decl	cond_lock(%rdi)
-#endif
-	jne	17f
-
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_unlock_wake
-
-17:	movq	(%rsp), %rax
-	jmp	18b
-
-#if defined __NR_clock_gettime && !defined __ASSUME_POSIX_TIMERS
-	/* clock_gettime not available.  */
-19:	leaq	24(%rsp), %rdi
-	xorl	%esi, %esi
-	movq	$VSYSCALL_ADDR_vgettimeofday, %rax
-	callq	*%rax
-
-	/* Compute relative timeout.  */
-	movq	32(%rsp), %rax
-	movl	$1000, %edx
-	mul	%rdx		/* Milli seconds to nano seconds.  */
-	movq	(%r13), %rcx
-	movq	8(%r13), %rdx
-	subq	24(%rsp), %rcx
-	subq	%rax, %rdx
-	jns	20f
-	addq	$1000000000, %rdx
-	decq	%rcx
-20:	testq	%rcx, %rcx
-	movq	8(%rsp), %rdi
-	movq	$-ETIMEDOUT, %r14
-	js	6b
-	jmp	21b
-#endif
-.LENDCODE:
-	.size	__pthread_cond_timedwait, .-__pthread_cond_timedwait
-versioned_symbol (libpthread, __pthread_cond_timedwait, pthread_cond_timedwait,
-		  GLIBC_2_3_2)
-
-
-	.section .eh_frame,"a",@progbits
-.LSTARTFRAME:
-	.long	L(ENDCIE)-L(STARTCIE)		# Length of the CIE.
-.LSTARTCIE:
-	.long	0				# CIE ID.
-	.byte	1				# Version number.
-#ifdef SHARED
-	.string	"zR"				# NUL-terminated augmentation
-						# string.
-#else
-	.ascii	"\0"				# NUL-terminated augmentation
-						# string.
-#endif
-	.uleb128 1				# Code alignment factor.
-	.sleb128 -8				# Data alignment factor.
-	.byte	16				# Return address register
-						# column.
-#ifdef SHARED
-	.uleb128 1				# Augmentation value length.
-	.byte	0x1b				# Encoding: DW_EH_PE_pcrel
-						# + DW_EH_PE_sdata4.
-#endif
-	.byte 0x0c				# DW_CFA_def_cfa
-	.uleb128 7
-	.uleb128 8
-	.byte	0x90				# DW_CFA_offset, column 0x8
-	.uleb128 1
-	.align 8
-.LENDCIE:
-
-	.long	.LENDFDE-.LSTARTFDE		# Length of the FDE.
-.LSTARTFDE:
-	.long	.LSTARTFDE-.LSTARTFRAME		# CIE pointer.
-#ifdef SHARED
-	.long	.LSTARTCODE-.			# PC-relative start address
-						# of the code
-#else
-	.long	.LSTARTCODE			# Start address of the code.
-#endif
-	.long	.LENDCODE-.LSTARTCODE		# Length of the code.
-#ifdef SHARED
-	.uleb128 0				# No augmentation data.
-#endif
-	.byte	0x40+.Lpush_r12-.LSTARTCODE	# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 16
-	.byte	0x8c				# DW_CFA_offset %r12
-	.uleb128 2
-	.byte	0x40+.Lpush_r13-.Lpush_r12	# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 24
-	.byte	0x8d				# DW_CFA_offset %r13
-	.uleb128 3
-	.byte	0x40+.Lpush_r14-.Lpush_r13	# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 32
-	.byte	0x84				# DW_CFA_offset %r14
-	.uleb128 4
-	.byte	0x40+.Lsubq-.Lpush_r14		# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 32+FRAME_SIZE
-	.byte	3				# DW_CFA_advance_loc2
-	.2byte	.Laddq-.Lsubq
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 32
-	.byte	0x40+.Lpop_r14-.Laddq		# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 24
-	.byte	0xce				# DW_CFA_restore %r14
-	.byte	0x40+.Lpop_r13-.Lpop_r14	# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 16
-	.byte	0xcd				# DW_CFA_restore %r13
-	.byte	0x40+.Lpop_r12-.Lpop_r13	# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 8
-	.byte	0xcc				# DW_CFA_restore %r12
-	.byte	0x40+.LSbl1-.Lpop_r12		# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 32+FRAME_SIZE
-	.byte	0x8c				# DW_CFA_offset %r12
-	.uleb128 2
-	.byte	0x8d				# DW_CFA_offset %r13
-	.uleb128 3
-	.byte	0x84				# DW_CFA_offset %r14
-	.uleb128 4
-	.align	8
-.LENDFDE:
diff -uprN glibc-2.5-20061008T1257_org/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S glibc-2.5-20061008T1257/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S
--- glibc-2.5-20061008T1257_org/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S	2009-01-19 13:52:14.000000000 +0530
+++ glibc-2.5-20061008T1257/nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S	1970-01-01 05:30:00.000000000 +0530
@@ -1,429 +0,0 @@
-/* Copyright (C) 2002, 2003, 2004, 2005, 2006 Free Software Foundation, Inc.
-   This file is part of the GNU C Library.
-   Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
-
-   The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
-
-   The GNU C Library is distributed in the hope that it will be useful,
-   but WITHOUT ANY WARRANTY; without even the implied warranty of
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
-
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
-   02111-1307 USA.  */
-
-#include <sysdep.h>
-#include <shlib-compat.h>
-#include <lowlevelcond.h>
-#include <tcb-offsets.h>
-
-#ifdef UP
-# define LOCK
-#else
-# define LOCK lock
-#endif
-
-#define SYS_futex		202
-#define FUTEX_WAIT		0
-#define FUTEX_WAKE		1
-
-
-	.text
-
-	.align	16
-	.type	__condvar_cleanup, @function
-	.globl	__condvar_cleanup
-	.hidden	__condvar_cleanup
-__condvar_cleanup:
-	pushq	%r12
-
-	/* Get internal lock.  */
-	movq	%rdi, %r8
-	movq	8(%rdi), %rdi
-	movl	$1, %esi
-	xorl	%eax, %eax
-	LOCK
-#if cond_lock == 0
-	cmpxchgl %esi, (%rdi)
-#else
-	cmpxchgl %esi, cond_lock(%rdi)
-#endif
-	jz	1f
-
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_lock_wait
-#if cond_lock != 0
-	subq	$cond_lock, %rdi
-#endif
-
-1:	movl	broadcast_seq(%rdi), %edx
-	cmpl	4(%r8), %edx
-	jne	3f
-
-	/* We increment the wakeup_seq counter only if it is lower than
-	   total_seq.  If this is not the case the thread was woken and
-	   then canceled.  In this case we ignore the signal.  */
-	movq	total_seq(%rdi), %rax
-	cmpq	wakeup_seq(%rdi), %rax
-	jbe	6f
-	incq	wakeup_seq(%rdi)
-	incl	cond_futex(%rdi)
-6:	incq	woken_seq(%rdi)
-
-3:	subl	$(1 << clock_bits), cond_nwaiters(%rdi)
-
-	/* Wake up a thread which wants to destroy the condvar object.  */
-	xorq	%r12, %r12
-	cmpq	$0xffffffffffffffff, total_seq(%rdi)
-	jne	4f
-	movl	cond_nwaiters(%rdi), %eax
-	andl	$~((1 << clock_bits) - 1), %eax
-	jne	4f
-
-	addq	$cond_nwaiters, %rdi
-	movl	$SYS_futex, %eax
-	movl	$FUTEX_WAKE, %esi
-	movl	$1, %edx
-	syscall
-	subq	$cond_nwaiters, %rdi
-	movl	$1, %r12d
-
-4:	LOCK
-#if cond_lock == 0
-	decl	(%rdi)
-#else
-	decl	cond_lock(%rdi)
-#endif
-	je	2f
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_unlock_wake
-
-	/* Wake up all waiters to make sure no signal gets lost.  */
-2:	testq	%r12, %r12
-	jnz	5f
-	addq	$cond_futex, %rdi
-	movl	$FUTEX_WAKE, %esi
-	movl	$0x7fffffff, %edx
-	movl	$SYS_futex, %eax
-	syscall
-
-5:	movq	16(%r8), %rdi
-	callq	__pthread_mutex_cond_lock
-
-	popq	%r12
-
-	retq
-	.size	__condvar_cleanup, .-__condvar_cleanup
-
-
-/* int pthread_cond_wait (pthread_cond_t *cond, pthread_mutex_t *mutex)  */
-	.globl	__pthread_cond_wait
-	.type	__pthread_cond_wait, @function
-	.align	16
-__pthread_cond_wait:
-.LSTARTCODE:
-	pushq	%r12
-.Lpush_r12:
-#define FRAME_SIZE 64
-	subq	$FRAME_SIZE, %rsp
-.Lsubq:
-	/* Stack frame:
-
-	   rsp + 64
-	            +--------------------------+
-	   rsp + 32 | cleanup buffer           |
-		    +--------------------------+
-	   rsp + 24 | old wake_seq value       |
-	            +--------------------------+
-	   rsp + 16 | mutex pointer            |
-	            +--------------------------+
-	   rsp +  8 | condvar pointer          |
-	            +--------------------------+
-	   rsp +  4 | old broadcast_seq value  |
-	            +--------------------------+
-	   rsp +  0 | old cancellation mode    |
-	            +--------------------------+
-	*/
-
-	cmpq	$-1, dep_mutex(%rdi)
-
-		/* Prepare structure passed to cancellation handler.  */
-	movq	%rdi, 8(%rsp)
-	movq	%rsi, 16(%rsp)
-
-	je	15f
-	movq	%rsi, dep_mutex(%rdi)
-
-	/* Get internal lock.  */
-15:	movl	$1, %esi
-	xorl	%eax, %eax
-	LOCK
-#if cond_lock == 0
-	cmpxchgl %esi, (%rdi)
-#else
-	cmpxchgl %esi, cond_lock(%rdi)
-#endif
-	jne	1f
-
-	/* Unlock the mutex.  */
-2:	movq	16(%rsp), %rdi
-	xorl	%esi, %esi
-	callq	__pthread_mutex_unlock_usercnt
-
-	testl	%eax, %eax
-	jne	12f
-
-	movq	8(%rsp), %rdi
-	incq	total_seq(%rdi)
-	incl	cond_futex(%rdi)
-	addl	$(1 << clock_bits), cond_nwaiters(%rdi)
-
-	/* Install cancellation handler.  */
-#ifdef PIC
-	leaq	__condvar_cleanup(%rip), %rsi
-#else
-	leaq	__condvar_cleanup, %rsi
-#endif
-	leaq	32(%rsp), %rdi
-	movq	%rsp, %rdx
-	callq	__pthread_cleanup_push
-
-	/* Get and store current wakeup_seq value.  */
-	movq	8(%rsp), %rdi
-	movq	wakeup_seq(%rdi), %r9
-	movl	broadcast_seq(%rdi), %edx
-	movq	%r9, 24(%rsp)
-	movl	%edx, 4(%rsp)
-
-	/* Unlock.  */
-8:	movl	cond_futex(%rdi), %r12d
-	LOCK
-#if cond_lock == 0
-	decl	(%rdi)
-#else
-	decl	cond_lock(%rdi)
-#endif
-	jne	3f
-
-4:	callq	__pthread_enable_asynccancel
-	movl	%eax, (%rsp)
-
-	movq	8(%rsp), %rdi
-	xorq	%r10, %r10
-	movq	%r12, %rdx
-	addq	$cond_futex-cond_lock, %rdi
-	movl	$SYS_futex, %eax
-#if FUTEX_WAIT == 0
-	xorl	%esi, %esi
-#else
-	movl	$FUTEX_WAIT, %esi
-#endif
-	syscall
-
-	movl	(%rsp), %edi
-	callq	__pthread_disable_asynccancel
-
-	/* Lock.  */
-	movq	8(%rsp), %rdi
-	movl	$1, %esi
-	xorl	%eax, %eax
-	LOCK
-#if cond_lock == 0
-	cmpxchgl %esi, (%rdi)
-#else
-	cmpxchgl %esi, cond_lock(%rdi)
-#endif
-	jnz	5f
-
-6:	movl	broadcast_seq(%rdi), %edx
-
-	movq	woken_seq(%rdi), %rax
-
-	movq	wakeup_seq(%rdi), %r9
-
-	cmpl	4(%rsp), %edx
-	jne	16f
-
-	cmpq	24(%rsp), %r9
-	jbe	8b
-
-	cmpq	%rax, %r9
-	jna	8b
-
-	incq	woken_seq(%rdi)
-
-	/* Unlock */
-16:	subl	$(1 << clock_bits), cond_nwaiters(%rdi)
-
-	/* Wake up a thread which wants to destroy the condvar object.  */
-	cmpq	$0xffffffffffffffff, total_seq(%rdi)
-	jne	17f
-	movl	cond_nwaiters(%rdi), %eax
-	andl	$~((1 << clock_bits) - 1), %eax
-	jne	17f
-
-	addq	$cond_nwaiters, %rdi
-	movl	$SYS_futex, %eax
-	movl	$FUTEX_WAKE, %esi
-	movl	$1, %edx
-	syscall
-	subq	$cond_nwaiters, %rdi
-
-17:	LOCK
-#if cond_lock == 0
-	decl	(%rdi)
-#else
-	decl	cond_lock(%rdi)
-#endif
-	jne	10f
-
-	/* Remove cancellation handler.  */
-11:	movq	32+CLEANUP_PREV(%rsp), %rdx
-	movq	%rdx, %fs:CLEANUP
-
-	movq	16(%rsp), %rdi
-	callq	__pthread_mutex_cond_lock
-14:	addq	$FRAME_SIZE, %rsp
-.Laddq:
-
-	popq	%r12
-.Lpop_r12:
-
-	/* We return the result of the mutex_lock operation.  */
-	retq
-
-	/* Initial locking failed.  */
-1:
-.LSbl1:
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_lock_wait
-	jmp	2b
-
-	/* Unlock in loop requires wakeup.  */
-3:
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_unlock_wake
-	jmp	4b
-
-	/* Locking in loop failed.  */
-5:
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_lock_wait
-#if cond_lock != 0
-	subq	$cond_lock, %rdi
-#endif
-	jmp	6b
-
-	/* Unlock after loop requires wakeup.  */
-10:
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_unlock_wake
-	jmp	11b
-
-	/* The initial unlocking of the mutex failed.  */
-12:	movq	%rax, %r10
-	movq	8(%rsp), %rdi
-	LOCK
-#if cond_lock == 0
-	decl	(%rdi)
-#else
-	decl	cond_lock(%rdi)
-#endif
-	jne	13f
-
-#if cond_lock != 0
-	addq	$cond_lock, %rdi
-#endif
-	callq	__lll_mutex_unlock_wake
-
-13:	movq	%r10, %rax
-	jmp	14b
-.LENDCODE:
-	.size	__pthread_cond_wait, .-__pthread_cond_wait
-versioned_symbol (libpthread, __pthread_cond_wait, pthread_cond_wait,
-		  GLIBC_2_3_2)
-
-
-	.section .eh_frame,"a",@progbits
-.LSTARTFRAME:
-	.long	L(ENDCIE)-L(STARTCIE)		# Length of the CIE.
-.LSTARTCIE:
-	.long	0				# CIE ID.
-	.byte	1				# Version number.
-#ifdef SHARED
-	.string	"zR"				# NUL-terminated augmentation
-						# string.
-#else
-	.ascii	"\0"				# NUL-terminated augmentation
-						# string.
-#endif
-	.uleb128 1				# Code alignment factor.
-	.sleb128 -8				# Data alignment factor.
-	.byte	16				# Return address register
-						# column.
-#ifdef SHARED
-	.uleb128 1				# Augmentation value length.
-	.byte	0x1b				# Encoding: DW_EH_PE_pcrel
-						# + DW_EH_PE_sdata4.
-#endif
-	.byte 0x0c				# DW_CFA_def_cfa
-	.uleb128 7
-	.uleb128 8
-	.byte	0x90				# DW_CFA_offset, column 0x8
-	.uleb128 1
-	.align 8
-.LENDCIE:
-
-	.long	.LENDFDE-.LSTARTFDE		# Length of the FDE.
-.LSTARTFDE:
-	.long	.LSTARTFDE-.LSTARTFRAME		# CIE pointer.
-#ifdef SHARED
-	.long	.LSTARTCODE-.			# PC-relative start address
-						# of the code
-#else
-	.long	.LSTARTCODE			# Start address of the code.
-#endif
-	.long	.LENDCODE-.LSTARTCODE		# Length of the code.
-#ifdef SHARED
-	.uleb128 0				# No augmentation data.
-#endif
-	.byte	0x40+.Lpush_r12-.LSTARTCODE	# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 16
-	.byte	0x8c				# DW_CFA_offset %r12
-	.uleb128 2
-	.byte	0x40+.Lsubq-.Lpush_r12		# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 16+FRAME_SIZE
-	.byte	3				# DW_CFA_advance_loc2
-	.2byte	.Laddq-.Lsubq
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 16
-	.byte	0x40+.Lpop_r12-.Laddq		# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 8
-	.byte	0xcc				# DW_CFA_restore %r12
-	.byte	0x40+.LSbl1-.Lpop_r12		# DW_CFA_advance_loc+N
-	.byte	14				# DW_CFA_def_cfa_offset
-	.uleb128 80
-	.byte	0x8c				# DW_CFA_offset %r12
-	.uleb128 2
-	.align	8
-.LENDFDE:

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 6/8] RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags
  2009-03-30 21:40   ` Eric Dumazet
@ 2009-03-30 22:40     ` Darren Hart
  0 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-30 22:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, Thomas Gleixner, Sripathi Kodi, Peter Zijlstra,
	John Stultz, Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Ingo Molnar, Jakub Jelinek

Eric Dumazet wrote:

Hi Eric,

Thanks for the prompt review...

> Darren Hart a écrit :

>> -	t.tv64 = restart->futex.time;
>> +	if (restart->futex.flags | FLAGS_HAS_TIMEOUT) {
> 
> You probably meant :
> 
> if (restart->futex.flags & FLAGS_HAS_TIMEOUT)

Gah! You are correct.  Corrected patch below.


RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags

From: Darren Hart <dvhltc@us.ibm.com>

Currently restart is only used if there is a timeout.  The requeue_pi
functionality requires restarting to futex_lock_pi() on signal after
wakeup in futex_wait_requeue_pi() regardless of if there was a timeout
or not.  Using 0 for the timeout value is confusing as that could
indicate an expired timer.  The flag makes this explicit.

Changelog:
V7pre: -Corrected FLAGS_HAS_TIMEOUT flag detection logic per Eric Dumazet
V6: - Initial version

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jakub Jelinek <jakub@redhat.com>
---

 kernel/futex.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)


diff --git a/kernel/futex.c b/kernel/futex.c
index dd05803..07e314c 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1254,6 +1254,7 @@ handle_fault:
  */
 #define FLAGS_SHARED		0x01
 #define FLAGS_CLOCKRT		0x02
+#define FLAGS_HAS_TIMEOUT	0x04
 
 static long futex_wait_restart(struct restart_block *restart);
 
@@ -1493,7 +1494,7 @@ retry_private:
 	restart->futex.val = val;
 	restart->futex.time = abs_time->tv64;
 	restart->futex.bitset = bitset;
-	restart->futex.flags = 0;
+	restart->futex.flags = FLAGS_HAS_TIMEOUT;
 
 	if (fshared)
 		restart->futex.flags |= FLAGS_SHARED;
@@ -1517,13 +1518,16 @@ static long futex_wait_restart(struct restart_block *restart)
 {
 	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
 	int fshared = 0;
-	ktime_t t;
+	ktime_t t, *tp = NULL;
 
-	t.tv64 = restart->futex.time;
+	if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
+		t.tv64 = restart->futex.time;
+		tp = &t;
+	}
 	restart->fn = do_no_restart_syscall;
 	if (restart->futex.flags & FLAGS_SHARED)
 		fshared = 1;
-	return (long)futex_wait(uaddr, fshared, restart->futex.val, &t,
+	return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
 				restart->futex.bitset,
 				restart->futex.flags & FLAGS_CLOCKRT);
 }

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls
  2009-03-30 21:47   ` Eric Dumazet
@ 2009-03-30 22:44     ` Darren Hart
  2009-03-30 23:31       ` Darren Hart
  0 siblings, 1 reply; 23+ messages in thread
From: Darren Hart @ 2009-03-30 22:44 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, Thomas Gleixner, Sripathi Kodi, Peter Zijlstra,
	John Stultz, Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Ingo Molnar, Jakub Jelinek

Eric Dumazet wrote:

Two more nice catches, thanks.  Corrected patch below.

>> +static long futex_lock_pi_restart(struct restart_block *restart)
>> +{
>> +	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
>> +	ktime_t t, *tp = NULL;
>> +	int fshared = restart->futex.flags & FLAGS_SHARED;
>> +
>> +	if (restart->futex.flags | FLAGS_HAS_TIMEOUT) {
> 
> if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {

 
> if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
> 
>> +		t.tv64 = restart->futex.time;
>> +		tp = &t;
>> +	}
>> +	restart->fn = do_no_restart_syscall;
>> +
> 
> 
> Strange your compiler dit not complains...

Well, the comparison with an "|" is still valid - just happens to always
be true :-)  I didn't get any errors - perhaps I should be compiling
with some addition options?


RFC: futex: add requeue_pi calls

From: Darren Hart <dvhltc@us.ibm.com>

PI Futexes and their underlying rt_mutex cannot be left ownerless if there are
pending waiters as this will break the PI boosting logic, so the standard
requeue commands aren't sufficient.  The new commands properly manage pi futex
ownership by ensuring a futex with waiters has an owner at all times.  This
will allow glibc to properly handle pi mutexes with pthread_condvars.

The approach taken here is to create two new futex op codes:

FUTEX_WAIT_REQUEUE_PI:
Tasks will use this op code to wait on a futex (such as a non-pi waitqueue)
and wake after they have been requeued to a pi futex.  Prior to returning to
userspace, they will acquire this pi futex (and the underlying rt_mutex).

futex_wait_requeue_pi() is the result of a high speed collision between
futex_wait() and futex_lock_pi() (with the first part of futex_lock_pi() being
done by futex_proxy_trylock_atomic() on behalf of the top_waiter).

FUTEX_REQUEUE_PI (and FUTEX_CMP_REQUEUE_PI):
This call must be used to wake tasks waiting with FUTEX_WAIT_REQUEUE_PI,
regardless of how many tasks the caller intends to wake or requeue.
pthread_cond_broadcast() should call this with nr_wake=1 and
nr_requeue=INT_MAX.  pthread_cond_signal() should call this with nr_wake=1 and
nr_requeue=0.  The reason being we need both callers to get the benefit of the
futex_proxy_trylock_atomic() routine.  futex_requeue() also enqueues the
top_waiter on the rt_mutex via rt_mutex_start_proxy_lock().

Changelog:
V7pre: -Corrected FLAGS_HAS_TIMEOUT flag detection logic per Eric Dumazet
V6: -Moved non requeue_pi related fixes/changes into separate patches
    -Make use of new double_unlock_hb()
    -Futex key management updates
    -Removed unnecessary futex_requeue_pi_cleanup() routine
    -Return -EINVAL if futex_wake is called with q.rt_waiter != NULL
    -Rewrote futex_wait_requeue_pi() wakeup logic
    -Rewrote requeue/wakeup loop
    -Renamed futex_requeue_pi_init() to futex_proxy_trylock_atomic()
    -Handle third party owner, removed -EMORON :-(
    -Comment updates
V5: -Update futex_requeue to allow for nr_requeue == 0
    -Whitespace cleanup
    -Added task_count var to futex_requeue to avoid confusion between
     ret, res, and ret used to count wakes and requeues
V4: -Cleanups to pass checkpatch.pl
    -Added missing goto out; in futex_wait_requeue_pi()
    -Moved rt_mutex_handle_wakeup to the rt_mutex_enqueue_task patch as they
     are a functional pair.
    -Fixed several error exit paths that failed to unqueue the futex_q, which
     not only would leave the futex_q on the hb, but would have caused an exit
     race with the waiter since they weren't synchonized on the hb lock.  Thanks
     Sripathi for catching this.
    -Fix pi_state handling in futex_requeue
    -Several other minor fixes to futex_requeue_pi
    -add requeue_futex function and force the requeue in requeue_pi even for the
     task we wake in the requeue loop
    -refill the pi state cache at the beginning of futex_requeue for requeue_pi
    -have futex_requeue_pi_init ensure it stores off the pi_state for use in
     futex_requeue
    - Delayed starting the hrtimer until after TASK_INTERRUPTIBLE is set
    - Fixed NULL pointer bug when futex_wait_requeue_pi() has no timer and
      receives a signal after waking on uaddr2.  Added has_timeout to the
      restart->futex structure.
V3: -Added FUTEX_CMP_REQUEUE_PI op
    -Put fshared support back.  So long as it is encoded in the op code, we
     assume both the uaddr's are either private or share, but not mixed.
    -Fixed access to expected value of uaddr2 in futex_wait_requeue_pi()
V2: -Added rt_mutex enqueueing to futex_requeue_pi_init
    -Updated fault handling and exit logic
V1: -Initial verion

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Sripathi Kodi <sripathik@in.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Dinakar Guniguntala <dino@in.ibm.com>
Cc: Ulrich Drepper <drepper@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jakub Jelinek <jakub@redhat.com>
---

 include/linux/futex.h       |    8 +
 include/linux/thread_info.h |    3 
 kernel/futex.c              |  533 +++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 524 insertions(+), 20 deletions(-)


diff --git a/include/linux/futex.h b/include/linux/futex.h
index 3bf5bb5..b05519c 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -23,6 +23,9 @@ union ktime;
 #define FUTEX_TRYLOCK_PI	8
 #define FUTEX_WAIT_BITSET	9
 #define FUTEX_WAKE_BITSET	10
+#define FUTEX_WAIT_REQUEUE_PI	11
+#define FUTEX_REQUEUE_PI	12
+#define FUTEX_CMP_REQUEUE_PI	13
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
@@ -38,6 +41,11 @@ union ktime;
 #define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAIT_BITSET_PRIVATE	(FUTEX_WAIT_BITS | FUTEX_PRIVATE_FLAG)
 #define FUTEX_WAKE_BITSET_PRIVATE	(FUTEX_WAKE_BITS | FUTEX_PRIVATE_FLAG)
+#define FUTEX_WAIT_REQUEUE_PI_PRIVATE	(FUTEX_WAIT_REQUEUE_PI | \
+					 FUTEX_PRIVATE_FLAG)
+#define FUTEX_REQUEUE_PI_PRIVATE	(FUTEX_REQUEUE_PI | FUTEX_PRIVATE_FLAG)
+#define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
+					 FUTEX_PRIVATE_FLAG)
 
 /*
  * Support for robust futexes: the kernel cleans up held futexes at
diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
index e6b820f..a8cc4e1 100644
--- a/include/linux/thread_info.h
+++ b/include/linux/thread_info.h
@@ -21,13 +21,14 @@ struct restart_block {
 		struct {
 			unsigned long arg0, arg1, arg2, arg3;
 		};
-		/* For futex_wait */
+		/* For futex_wait and futex_wait_requeue_pi */
 		struct {
 			u32 *uaddr;
 			u32 val;
 			u32 flags;
 			u32 bitset;
 			u64 time;
+			u32 *uaddr2;
 		} futex;
 		/* For nanosleep */
 		struct {
diff --git a/kernel/futex.c b/kernel/futex.c
index a9c7da1..115ec52 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -19,6 +19,10 @@
  *  PRIVATE futexes by Eric Dumazet
  *  Copyright (C) 2007 Eric Dumazet <dada1@cosmosbay.com>
  *
+ *  Requeue-PI support by Darren Hart <dvhltc@us.ibm.com>
+ *  Copyright (C) IBM Corporation, 2009
+ *  Thanks to Thomas Gleixner for conceptual design and careful reviews.
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -109,6 +113,9 @@ struct futex_q {
 	struct futex_pi_state *pi_state;
 	struct task_struct *task;
 
+	/* rt_waiter storage for requeue_pi: */
+	struct rt_mutex_waiter *rt_waiter;
+
 	/* Bitset for the optional bitmasked wakeup */
 	u32 bitset;
 };
@@ -829,7 +836,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
 
 	plist_for_each_entry_safe(this, next, head, list) {
 		if (match_futex (&this->key, &key)) {
-			if (this->pi_state) {
+			if (this->pi_state || this->rt_waiter) {
 				ret = -EINVAL;
 				break;
 			}
@@ -970,20 +977,116 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
 	q->key = *key2;
 }
 
-/*
- * Requeue all waiters hashed on one physical page to another
- * physical page.
+/**
+ * futex_proxy_trylock_atomic() - Attempt an atomic lock for the top waiter
+ * @pifutex:	the user address of the to futex
+ * @hb1:	the from futex hash bucket, must be locked by the caller
+ * @hb2:	the to futex hash bucket, must be locked by the caller
+ * @key1:	the from futex key
+ * @key2:	the to futex key
+ *
+ * Try and get the lock on behalf of the top waiter if we can do it atomically.
+ * Wake the top waiter if we succeed.  hb1 and hb2 must be held by the caller.
+ *
+ * Faults occur for two primary reasons at this point:
+ * 1) The address isn't mapped
+ * 2) The address isn't writeable
+ *
+ * We return EFAULT on either of these cases and rely on the caller to handle
+ * them.
+ *
+ * Returns:
+ *  0 - failed to acquire the lock atomicly
+ *  1 - acquired the lock
+ * <0 - error
+ */
+static int futex_proxy_trylock_atomic(u32 __user *pifutex,
+				 struct futex_hash_bucket *hb1,
+				 struct futex_hash_bucket *hb2,
+				 union futex_key *key1, union futex_key *key2,
+				 struct futex_pi_state **ps)
+{
+	struct futex_q *top_waiter;
+	u32 curval;
+	int ret;
+
+	if (get_futex_value_locked(&curval, pifutex))
+		return -EFAULT;
+
+	top_waiter = futex_top_waiter(hb1, key1);
+
+	/* There are no waiters, nothing for us to do. */
+	if (!top_waiter)
+		return 0;
+
+	/*
+	 * Either take the lock for top_waiter or set the FUTEX_WAITERS bit.
+	 * The pi_state is returned in ps in contended cases.
+	 */
+	ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task);
+	if (ret == 1) {
+		/*
+		 * Set the top_waiter key for the requeue target futex so the
+		 * waiter can detect the wakeup on the right futex, but remove
+		 * it from the hb so it can detect atomic lock acquisition.
+		 */
+		drop_futex_key_refs(&top_waiter->key);
+		get_futex_key_refs(key2);
+		top_waiter->key = *key2;
+		WARN_ON(plist_node_empty(&top_waiter->list));
+		plist_del(&top_waiter->list, &top_waiter->list.plist);
+		/*
+		 * FIXME: wake_futex() wakes first, then nulls the lock_ptr,
+		 * and uses a memory barrier.  Do we need to?
+		 */
+		top_waiter->lock_ptr = NULL;
+		wake_up(&top_waiter->waiter);
+	}
+
+	return ret;
+}
+
+/**
+ * futex_requeue() - Requeue waiters from uaddr1 to uaddr2
+ * uaddr1:	source futex user address
+ * uaddr2:	target futex user address
+ * nr_wake:	number of waiters to wake (must be 1 for requeue_pi)
+ * nr_requeue:	number of waiters to requeue (0-INT_MAX)
+ * requeue_pi:	if we are attempting to requeue from a non-pi futex to a
+ * 		pi futex (pi to pi requeue is not supported)
+ *
+ * Requeue waiters on uaddr1 to uaddr2. In the requeue_pi case, try to acquire
+ * uaddr2 atomically on behalf of the top waiter.
+ *
+ * Returns:
+ * >=0: on success, the number of tasks requeued or woken
+ *  <0: on error
  */
 static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
-			 int nr_wake, int nr_requeue, u32 *cmpval)
+			 int nr_wake, int nr_requeue, u32 *cmpval,
+			 int requeue_pi)
 {
 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
+	int drop_count = 0, task_count = 0, ret;
+	struct futex_pi_state *pi_state = NULL;
 	struct futex_hash_bucket *hb1, *hb2;
 	struct plist_head *head1;
 	struct futex_q *this, *next;
-	int ret, drop_count = 0;
+	u32 curval2;
+
+	if (requeue_pi) {
+		if (refill_pi_state_cache())
+			return -ENOMEM;
+		if (nr_wake != 1)
+			return -EINVAL;
+	}
 
 retry:
+	if (pi_state != NULL) {
+		free_pi_state(pi_state);
+		pi_state = NULL;
+	}
+
 	ret = get_futex_key(uaddr1, fshared, &key1);
 	if (unlikely(ret != 0))
 		goto out;
@@ -1022,19 +1125,92 @@ retry_private:
 		}
 	}
 
+	if (requeue_pi) {
+		/* Attempt to acquire uaddr2 and wake the top_waiter. */
+		ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
+						 &key2, &pi_state);
+
+		/*
+		 * At this point the top_waiter has either taken uaddr2 or is
+		 * waiting on it.  If the former, then the pi_state will not
+		 * exist yet, look it up one more time to ensure we have a
+		 * reference to it.
+		 */
+		if (ret == 1 && !pi_state) {
+			task_count++;
+			ret = get_futex_value_locked(&curval2, uaddr2);
+			if (!ret)
+				ret = lookup_pi_state(curval2, hb2, &key2,
+						      &pi_state);
+		}
+
+		switch (ret) {
+		case 0:
+			break;
+		case -EFAULT:
+			double_unlock_hb(hb1, hb2);
+			put_futex_key(fshared, &key2);
+			put_futex_key(fshared, &key1);
+			ret = get_user(curval2, uaddr2);
+			if (!ret)
+				goto retry;
+			goto out;
+		case -EAGAIN:
+			/* The owner was exiting, try again. */
+			double_unlock_hb(hb1, hb2);
+			put_futex_key(fshared, &key2);
+			put_futex_key(fshared, &key1);
+			cond_resched();
+			goto retry;
+		default:
+			goto out_unlock;
+		}
+	}
+
 	head1 = &hb1->chain;
 	plist_for_each_entry_safe(this, next, head1, list) {
-		if (!match_futex (&this->key, &key1))
+		if (task_count - nr_wake >= nr_requeue)
+			break;
+
+		if (!match_futex(&this->key, &key1))
 			continue;
-		if (++ret <= nr_wake) {
+
+		/* This can go after we're satisfied with testing. */
+		if (!requeue_pi)
+			WARN_ON(this->rt_waiter);
+
+		/*
+		 * Wake nr_wake waiters.  For requeue_pi, if we acquired the
+		 * lock, we already woke the top_waiter.  If not, it will be
+		 * woken by futex_unlock_pi().
+		 */
+		if (++task_count <= nr_wake && !requeue_pi) {
 			wake_futex(this);
-		} else {
-			requeue_futex(this, hb1, hb2, &key2);
-			drop_count++;
+			continue;
+		}
 
-			if (ret - nr_wake >= nr_requeue)
-				break;
+		/*
+		 * Requeue nr_requeue waiters and possibly one more in the case
+		 * of requeue_pi if we couldn't acquire the lock atomically.
+		 */
+		if (requeue_pi) {
+			/* This can go after we're satisfied with testing. */
+			WARN_ON(!this->rt_waiter);
+
+			/* Prepare the waiter to take the rt_mutex. */
+			atomic_inc(&pi_state->refcount);
+			this->pi_state = pi_state;
+			ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
+							this->rt_waiter,
+							this->task, 1);
+			if (ret) {
+				this->pi_state = NULL;
+				free_pi_state(pi_state);
+				goto out_unlock;
+			}
 		}
+		requeue_futex(this, hb1, hb2, &key2);
+		drop_count++;
 	}
 
 out_unlock:
@@ -1049,7 +1225,9 @@ out_put_keys:
 out_put_key1:
 	put_futex_key(fshared, &key1);
 out:
-	return ret;
+	if (pi_state != NULL)
+		free_pi_state(pi_state);
+	return ret ? ret : task_count;
 }
 
 /* The key must be already stored in q->key. */
@@ -1272,6 +1450,8 @@ handle_fault:
 #define FLAGS_HAS_TIMEOUT	0x04
 
 static long futex_wait_restart(struct restart_block *restart);
+static long futex_wait_requeue_pi_restart(struct restart_block *restart);
+static long futex_lock_pi_restart(struct restart_block *restart);
 
 /**
  * finish_futex_lock_pi() - Post lock pi_state and corner case management
@@ -1419,6 +1599,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
 
 	q.pi_state = NULL;
 	q.bitset = bitset;
+	q.rt_waiter = NULL;
 
 	if (abs_time) {
 		unsigned long slack;
@@ -1575,6 +1756,7 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
 	}
 
 	q.pi_state = NULL;
+	q.rt_waiter = NULL;
 retry:
 	q.key = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr, fshared, &q.key);
@@ -1670,6 +1852,20 @@ uaddr_faulted:
 	goto retry;
 }
 
+static long futex_lock_pi_restart(struct restart_block *restart)
+{
+	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
+	ktime_t t, *tp = NULL;
+	int fshared = restart->futex.flags & FLAGS_SHARED;
+
+	if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
+		t.tv64 = restart->futex.time;
+		tp = &t;
+	}
+	restart->fn = do_no_restart_syscall;
+
+	return (long)futex_lock_pi(uaddr, fshared, restart->futex.val, tp, 0);
+}
 
 /*
  * Userspace attempted a TID -> 0 atomic transition, and failed.
@@ -1772,6 +1968,290 @@ pi_faulted:
 	return ret;
 }
 
+/**
+ * futex_wait_requeue_pi() - Wait on uaddr and take uaddr2
+ * @uaddr:	the futex we initialyl wait on (non-pi)
+ * @fshared:	whether the futexes are shared (1) or not (0).  They must be
+ * 		the same type, no requeueing from private to shared, etc.
+ * @val:	the expected value of uaddr
+ * @abs_time:	absolute timeout
+ * @bitset:	32 bit wakeup bitset set by userspace, defaults to all.
+ * @clockrt:	whether to use CLOCK_REALTIME (1) or CLOCK_MONOTONIC (0)
+ * @uaddr2:	the pi futex we will take prior to returning to user-space
+ *
+ * The caller will wait on uaddr and will be requeued by futex_requeue() to
+ * uaddr2 which must be PI aware.  Normal wakeup will wake on uaddr2 and
+ * complete the acquisition of the rt_mutex prior to returning to userspace.
+ * This ensures the rt_mutex maintains an owner when it has waiters; without
+ * one, the pi logic wouldn't know which task to boost/deboost, if there was a
+ * need to.
+ *
+ * We call schedule in futex_wait_queue_me() when we enqueue and return there
+ * via the following:
+ * 1) wakeup on uaddr2 after an atomic lock acquisition by futex_requeue()
+ * 2) wakeup on uaddr2 after a requeue and subsequent unlock
+ * 3) signal (before or after requeue)
+ * 4) timeout (before or after requeue)
+ *
+ * If 3, we setup a restart_block with futex_wait_requeue_pi() as the function.
+ *
+ * If 2, we may then block on trying to take the rt_mutex and return via:
+ * 5) successful lock
+ * 6) signal
+ * 7) timeout
+ * 8) other lock acquisition failure
+ *
+ * If 6, we setup a restart_block with futex_lock_pi() as the function.
+ *
+ * If 4 or 7, we cleanup and return with -ETIMEDOUT.
+ *
+ * Returns:
+ *  0 - On success
+ * <0 - On error
+ */
+static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
+				 u32 val, ktime_t *abs_time, u32 bitset,
+				 int clockrt, u32 __user *uaddr2)
+{
+	struct hrtimer_sleeper timeout, *to = NULL;
+	struct rt_mutex_waiter rt_waiter;
+	struct restart_block *restart;
+	struct futex_hash_bucket *hb;
+	struct rt_mutex *pi_mutex;
+	union futex_key key2;
+	struct futex_q q;
+	u32 uval;
+	int ret;
+
+	if (!bitset)
+		return -EINVAL;
+
+	if (abs_time) {
+		to = &timeout;
+		hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME :
+				      CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+		hrtimer_init_sleeper(to, current);
+		hrtimer_set_expires_range_ns(&to->timer, *abs_time,
+					     current->timer_slack_ns);
+	}
+
+	/*
+	 * The waiter is allocated on our stack, manipulated by the requeue
+	 * code while we sleep on uaddr.
+	 */
+	debug_rt_mutex_init_waiter(&rt_waiter);
+	rt_waiter.task = NULL;
+
+	q.pi_state = NULL;
+	q.bitset = bitset;
+	q.rt_waiter = &rt_waiter;
+
+retry:
+	q.key = FUTEX_KEY_INIT;
+	ret = get_futex_key(uaddr, fshared, &q.key);
+	if (unlikely(ret != 0))
+		goto out;
+
+	key2 = FUTEX_KEY_INIT;
+	ret = get_futex_key(uaddr2, fshared, &key2);
+	if (unlikely(ret != 0)) {
+		put_futex_key(fshared, &q.key);
+		goto out;
+	}
+
+	hb = queue_lock(&q);
+
+	/*
+	 * Access the page AFTER the hash-bucket is locked.
+	 * Order is important:
+	 *
+	 *   Userspace waiter: val = var; if (cond(val)) futex_wait(&var, val);
+	 *   Userspace waker:  if (cond(var)) { var = new; futex_wake(&var); }
+	 *
+	 * The basic logical guarantee of a futex is that it blocks ONLY
+	 * if cond(var) is known to be true at the time of blocking, for
+	 * any cond.  If we queued after testing *uaddr, that would open
+	 * a race condition where we could block indefinitely with
+	 * cond(var) false, which would violate the guarantee.
+	 *
+	 * A consequence is that futex_wait() can return zero and absorb
+	 * a wakeup when *uaddr != val on entry to the syscall.  This is
+	 * rare, but normal.
+	 */
+	ret = get_futex_value_locked(&uval, uaddr);
+
+	if (unlikely(ret)) {
+		queue_unlock(&q, hb);
+		put_futex_key(fshared, &q.key);
+		put_futex_key(fshared, &key2);
+
+		ret = get_user(uval, uaddr);
+		if (!ret)
+			goto retry;
+		goto out;
+	}
+
+	/* Only actually queue if *uaddr contained val.  */
+	ret = -EWOULDBLOCK;
+	if (uval != val) {
+		queue_unlock(&q, hb);
+		put_futex_key(fshared, &q.key);
+		put_futex_key(fshared, &key2);
+		goto out;
+	}
+
+	/* Queue the futex_q, drop the hb lock, wait for wakeup. */
+	futex_wait_queue_me(hb, &q, to);
+
+	/*
+	 * Ensure the requeue is atomic to avoid races while we process the
+	 * wakeup.  We only need to hold hb->lock to ensure atomicity as the
+	 * wakeup code can't change q.key from uaddr to uaddr2 if we hold that
+	 * lock. It can't be requeued from uaddr2 to something else since we
+	 * don't support a PI aware source futex for requeue.
+	 */
+	spin_lock(&hb->lock);
+	if (!match_futex(&q.key, &key2)) {
+		WARN_ON(q.lock_ptr && (&hb->lock != q.lock_ptr));
+		/*
+		 * We were not requeued, handle wakeup from futex1 (uaddr).  We
+		 * cannot have been unqueued and already hold the lock, no need
+		 * to call unqueue_me, just do it directly.
+		 */
+		plist_del(&q.list, &q.list.plist);
+		drop_futex_key_refs(&q.key);
+
+		ret = -ETIMEDOUT;
+		if (to && !to->task) {
+			spin_unlock(&hb->lock);
+			goto out_put_keys;
+		}
+
+		/*
+		 * We expect signal_pending(current), but another thread may
+		 * have handled it for us already.
+		 */
+		ret = -ERESTARTSYS;
+		if (!abs_time) {
+			spin_unlock(&hb->lock);
+			goto out_put_keys;
+		}
+
+		restart = &current_thread_info()->restart_block;
+		restart->fn = futex_wait_requeue_pi_restart;
+		restart->futex.uaddr = (u32 *)uaddr;
+		restart->futex.val = val;
+		restart->futex.time = abs_time->tv64;
+		restart->futex.bitset = bitset;
+		restart->futex.flags = 0;
+		restart->futex.uaddr2 = (u32 *)uaddr2;
+		restart->futex.flags = FLAGS_HAS_TIMEOUT;
+
+		if (fshared)
+			restart->futex.flags |= FLAGS_SHARED;
+		if (clockrt)
+			restart->futex.flags |= FLAGS_CLOCKRT;
+
+		ret = -ERESTART_RESTARTBLOCK;
+
+		spin_unlock(&hb->lock);
+		goto out_put_keys;
+	}
+	spin_unlock(&hb->lock);
+
+	ret = 0;
+	/*
+	 * Check if the waker acquired the second futex for us. If the lock_ptr
+	 * is NULL, but our key is key2, then the requeue target futex was
+	 * uncontended and the waker gave it to us.  This is safe without a lock
+	 * as futex_requeue() will not release the hb lock until after it's
+	 * nulled the lock_ptr and removed us from the hb.
+	 */
+	if (!q.lock_ptr)
+		goto out_put_keys;
+
+	/*
+	 * At this point we have been requeued.  We have been woken up by
+	 * futex_unlock_pi(), a timeout, or a signal, but not futex_requeue().
+	 * futex_unlock_pi() will not destroy the lock_ptr nor the pi_state.
+	 */
+	WARN_ON(!&q.pi_state);
+	pi_mutex = &q.pi_state->pi_mutex;
+	ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
+	debug_rt_mutex_free_waiter(&waiter);
+
+	spin_lock(q.lock_ptr);
+	ret = finish_futex_lock_pi(uaddr, fshared, &q, ret);
+
+	/* Unqueue and drop the lock. */
+	unqueue_me_pi(&q);
+
+	/*
+	 * If fixup_pi_state_owner() faulted and was unable to handle the
+	 * fault, unlock it and return the fault to userspace.
+	 */
+	if (ret == -EFAULT) {
+		if (rt_mutex_owner(pi_mutex) == current)
+			rt_mutex_unlock(pi_mutex);
+	} else if (ret == -EINTR) {
+		if (get_user(uval, uaddr2)) {
+			ret = -EFAULT;
+			goto out_put_keys;
+		}
+
+		/*
+		 * We've already been requeued, so restart by calling
+		 * futex_lock_pi() directly, rather then returning to this
+		 * function.
+		 */
+		restart = &current_thread_info()->restart_block;
+		restart->fn = futex_lock_pi_restart;
+		restart->futex.uaddr = (u32 *)uaddr2;
+		restart->futex.val = uval;
+		restart->futex.flags = 0;
+		if (abs_time) {
+			restart->futex.flags |= FLAGS_HAS_TIMEOUT;
+			restart->futex.time = abs_time->tv64;
+		}
+
+		if (fshared)
+			restart->futex.flags |= FLAGS_SHARED;
+		if (clockrt)
+			restart->futex.flags |= FLAGS_CLOCKRT;
+		ret = -ERESTART_RESTARTBLOCK;
+	}
+
+out_put_keys:
+	put_futex_key(fshared, &q.key);
+	put_futex_key(fshared, &key2);
+
+out:
+	if (to) {
+		hrtimer_cancel(&to->timer);
+		destroy_hrtimer_on_stack(&to->timer);
+	}
+	return ret;
+}
+
+static long futex_wait_requeue_pi_restart(struct restart_block *restart)
+{
+	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
+	u32 __user *uaddr2 = (u32 __user *)restart->futex.uaddr2;
+	int fshared = restart->futex.flags & FLAGS_SHARED;
+	int clockrt = restart->futex.flags & FLAGS_CLOCKRT;
+	ktime_t t, *tp = NULL;
+
+	if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
+		t.tv64 = restart->futex.time;
+		tp = &t;
+	}
+	restart->fn = do_no_restart_syscall;
+
+	return (long)futex_wait_requeue_pi(uaddr, fshared, restart->futex.val,
+					   tp, restart->futex.bitset, clockrt,
+					   uaddr2);
+}
+
 /*
  * Support for robust futexes: the kernel cleans up held futexes at
  * thread exit time.
@@ -1994,7 +2474,7 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		fshared = 1;
 
 	clockrt = op & FUTEX_CLOCK_REALTIME;
-	if (clockrt && cmd != FUTEX_WAIT_BITSET)
+	if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
 		return -ENOSYS;
 
 	switch (cmd) {
@@ -2009,10 +2489,11 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		ret = futex_wake(uaddr, fshared, val, val3);
 		break;
 	case FUTEX_REQUEUE:
-		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL);
+		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0);
 		break;
 	case FUTEX_CMP_REQUEUE:
-		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3);
+		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
+				    0);
 		break;
 	case FUTEX_WAKE_OP:
 		ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
@@ -2029,6 +2510,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		if (futex_cmpxchg_enabled)
 			ret = futex_lock_pi(uaddr, fshared, 0, timeout, 1);
 		break;
+	case FUTEX_WAIT_REQUEUE_PI:
+		val3 = FUTEX_BITSET_MATCH_ANY;
+		ret = futex_wait_requeue_pi(uaddr, fshared, val, timeout, val3,
+					    clockrt, uaddr2);
+		break;
+	case FUTEX_REQUEUE_PI:
+		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 1);
+		break;
+	case FUTEX_CMP_REQUEUE_PI:
+		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
+				    1);
+		break;
 	default:
 		ret = -ENOSYS;
 	}
@@ -2046,7 +2539,8 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 	int cmd = op & FUTEX_CMD_MASK;
 
 	if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
-		      cmd == FUTEX_WAIT_BITSET)) {
+		      cmd == FUTEX_WAIT_BITSET ||
+		      cmd == FUTEX_WAIT_REQUEUE_PI)) {
 		if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
 			return -EFAULT;
 		if (!timespec_valid(&ts))
@@ -2058,10 +2552,11 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 		tp = &t;
 	}
 	/*
-	 * requeue parameter in 'utime' if cmd == FUTEX_REQUEUE.
+	 * requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*.
 	 * number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP.
 	 */
 	if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
+	    cmd == FUTEX_REQUEUE_PI || cmd == FUTEX_CMP_REQUEUE_PI ||
 	    cmd == FUTEX_WAKE_OP)
 		val2 = (u32) (unsigned long) utime;
 


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls
  2009-03-30 22:44     ` Darren Hart
@ 2009-03-30 23:31       ` Darren Hart
  0 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-30 23:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, Thomas Gleixner, Sripathi Kodi, Peter Zijlstra,
	John Stultz, Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Ingo Molnar, Jakub Jelinek

Darren Hart wrote:
> Eric Dumazet wrote:
> 
> Two more nice catches, thanks.  Corrected patch below.

If anyone is still wanting to pull these from git, you can grab them 
from my -dev branch.  Note: I pop and push branches to this branch, 
whereas the versioned branches will remain constant.

http://git.kernel.org/?p=linux/kernel/git/dvhart/linux-2.6-tip-hacks.git;a=shortlog;h=requeue-pi-dev

Thanks,

Darren

> 
>>> +static long futex_lock_pi_restart(struct restart_block *restart)
>>> +{
>>> +    u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
>>> +    ktime_t t, *tp = NULL;
>>> +    int fshared = restart->futex.flags & FLAGS_SHARED;
>>> +
>>> +    if (restart->futex.flags | FLAGS_HAS_TIMEOUT) {
>>
>> if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
> 
> 
>> if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
>>
>>> +        t.tv64 = restart->futex.time;
>>> +        tp = &t;
>>> +    }
>>> +    restart->fn = do_no_restart_syscall;
>>> +
>>
>>
>> Strange your compiler dit not complains...
> 
> Well, the comparison with an "|" is still valid - just happens to always
> be true :-)  I didn't get any errors - perhaps I should be compiling
> with some addition options?
> 
> 
> RFC: futex: add requeue_pi calls
> 
> From: Darren Hart <dvhltc@us.ibm.com>
> 
> PI Futexes and their underlying rt_mutex cannot be left ownerless if 
> there are
> pending waiters as this will break the PI boosting logic, so the standard
> requeue commands aren't sufficient.  The new commands properly manage pi 
> futex
> ownership by ensuring a futex with waiters has an owner at all times.  This
> will allow glibc to properly handle pi mutexes with pthread_condvars.
> 
> The approach taken here is to create two new futex op codes:
> 
> FUTEX_WAIT_REQUEUE_PI:
> Tasks will use this op code to wait on a futex (such as a non-pi waitqueue)
> and wake after they have been requeued to a pi futex.  Prior to 
> returning to
> userspace, they will acquire this pi futex (and the underlying rt_mutex).
> 
> futex_wait_requeue_pi() is the result of a high speed collision between
> futex_wait() and futex_lock_pi() (with the first part of futex_lock_pi() 
> being
> done by futex_proxy_trylock_atomic() on behalf of the top_waiter).
> 
> FUTEX_REQUEUE_PI (and FUTEX_CMP_REQUEUE_PI):
> This call must be used to wake tasks waiting with FUTEX_WAIT_REQUEUE_PI,
> regardless of how many tasks the caller intends to wake or requeue.
> pthread_cond_broadcast() should call this with nr_wake=1 and
> nr_requeue=INT_MAX.  pthread_cond_signal() should call this with 
> nr_wake=1 and
> nr_requeue=0.  The reason being we need both callers to get the benefit 
> of the
> futex_proxy_trylock_atomic() routine.  futex_requeue() also enqueues the
> top_waiter on the rt_mutex via rt_mutex_start_proxy_lock().
> 
> Changelog:
> V7pre: -Corrected FLAGS_HAS_TIMEOUT flag detection logic per Eric Dumazet
> V6: -Moved non requeue_pi related fixes/changes into separate patches
>    -Make use of new double_unlock_hb()
>    -Futex key management updates
>    -Removed unnecessary futex_requeue_pi_cleanup() routine
>    -Return -EINVAL if futex_wake is called with q.rt_waiter != NULL
>    -Rewrote futex_wait_requeue_pi() wakeup logic
>    -Rewrote requeue/wakeup loop
>    -Renamed futex_requeue_pi_init() to futex_proxy_trylock_atomic()
>    -Handle third party owner, removed -EMORON :-(
>    -Comment updates
> V5: -Update futex_requeue to allow for nr_requeue == 0
>    -Whitespace cleanup
>    -Added task_count var to futex_requeue to avoid confusion between
>     ret, res, and ret used to count wakes and requeues
> V4: -Cleanups to pass checkpatch.pl
>    -Added missing goto out; in futex_wait_requeue_pi()
>    -Moved rt_mutex_handle_wakeup to the rt_mutex_enqueue_task patch as they
>     are a functional pair.
>    -Fixed several error exit paths that failed to unqueue the futex_q, 
> which
>     not only would leave the futex_q on the hb, but would have caused an 
> exit
>     race with the waiter since they weren't synchonized on the hb lock.  
> Thanks
>     Sripathi for catching this.
>    -Fix pi_state handling in futex_requeue
>    -Several other minor fixes to futex_requeue_pi
>    -add requeue_futex function and force the requeue in requeue_pi even 
> for the
>     task we wake in the requeue loop
>    -refill the pi state cache at the beginning of futex_requeue for 
> requeue_pi
>    -have futex_requeue_pi_init ensure it stores off the pi_state for use in
>     futex_requeue
>    - Delayed starting the hrtimer until after TASK_INTERRUPTIBLE is set
>    - Fixed NULL pointer bug when futex_wait_requeue_pi() has no timer and
>      receives a signal after waking on uaddr2.  Added has_timeout to the
>      restart->futex structure.
> V3: -Added FUTEX_CMP_REQUEUE_PI op
>    -Put fshared support back.  So long as it is encoded in the op code, we
>     assume both the uaddr's are either private or share, but not mixed.
>    -Fixed access to expected value of uaddr2 in futex_wait_requeue_pi()
> V2: -Added rt_mutex enqueueing to futex_requeue_pi_init
>    -Updated fault handling and exit logic
> V1: -Initial verion
> 
> Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Sripathi Kodi <sripathik@in.ibm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: John Stultz <johnstul@us.ibm.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Dinakar Guniguntala <dino@in.ibm.com>
> Cc: Ulrich Drepper <drepper@redhat.com>
> Cc: Eric Dumazet <dada1@cosmosbay.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Jakub Jelinek <jakub@redhat.com>
> ---
> 
> include/linux/futex.h       |    8 +
> include/linux/thread_info.h |    3 kernel/futex.c              |  533 
> +++++++++++++++++++++++++++++++++++++++++--
> 3 files changed, 524 insertions(+), 20 deletions(-)
> 
> 
> diff --git a/include/linux/futex.h b/include/linux/futex.h
> index 3bf5bb5..b05519c 100644
> --- a/include/linux/futex.h
> +++ b/include/linux/futex.h
> @@ -23,6 +23,9 @@ union ktime;
> #define FUTEX_TRYLOCK_PI    8
> #define FUTEX_WAIT_BITSET    9
> #define FUTEX_WAKE_BITSET    10
> +#define FUTEX_WAIT_REQUEUE_PI    11
> +#define FUTEX_REQUEUE_PI    12
> +#define FUTEX_CMP_REQUEUE_PI    13
> 
> #define FUTEX_PRIVATE_FLAG    128
> #define FUTEX_CLOCK_REALTIME    256
> @@ -38,6 +41,11 @@ union ktime;
> #define FUTEX_TRYLOCK_PI_PRIVATE (FUTEX_TRYLOCK_PI | FUTEX_PRIVATE_FLAG)
> #define FUTEX_WAIT_BITSET_PRIVATE    (FUTEX_WAIT_BITS | FUTEX_PRIVATE_FLAG)
> #define FUTEX_WAKE_BITSET_PRIVATE    (FUTEX_WAKE_BITS | FUTEX_PRIVATE_FLAG)
> +#define FUTEX_WAIT_REQUEUE_PI_PRIVATE    (FUTEX_WAIT_REQUEUE_PI | \
> +                     FUTEX_PRIVATE_FLAG)
> +#define FUTEX_REQUEUE_PI_PRIVATE    (FUTEX_REQUEUE_PI | 
> FUTEX_PRIVATE_FLAG)
> +#define FUTEX_CMP_REQUEUE_PI_PRIVATE    (FUTEX_CMP_REQUEUE_PI | \
> +                     FUTEX_PRIVATE_FLAG)
> 
> /*
>  * Support for robust futexes: the kernel cleans up held futexes at
> diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h
> index e6b820f..a8cc4e1 100644
> --- a/include/linux/thread_info.h
> +++ b/include/linux/thread_info.h
> @@ -21,13 +21,14 @@ struct restart_block {
>         struct {
>             unsigned long arg0, arg1, arg2, arg3;
>         };
> -        /* For futex_wait */
> +        /* For futex_wait and futex_wait_requeue_pi */
>         struct {
>             u32 *uaddr;
>             u32 val;
>             u32 flags;
>             u32 bitset;
>             u64 time;
> +            u32 *uaddr2;
>         } futex;
>         /* For nanosleep */
>         struct {
> diff --git a/kernel/futex.c b/kernel/futex.c
> index a9c7da1..115ec52 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -19,6 +19,10 @@
>  *  PRIVATE futexes by Eric Dumazet
>  *  Copyright (C) 2007 Eric Dumazet <dada1@cosmosbay.com>
>  *
> + *  Requeue-PI support by Darren Hart <dvhltc@us.ibm.com>
> + *  Copyright (C) IBM Corporation, 2009
> + *  Thanks to Thomas Gleixner for conceptual design and careful reviews.
> + *
>  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
>  *  enough at me, Linus for the original (flawed) idea, Matthew
>  *  Kirkwood for proof-of-concept implementation.
> @@ -109,6 +113,9 @@ struct futex_q {
>     struct futex_pi_state *pi_state;
>     struct task_struct *task;
> 
> +    /* rt_waiter storage for requeue_pi: */
> +    struct rt_mutex_waiter *rt_waiter;
> +
>     /* Bitset for the optional bitmasked wakeup */
>     u32 bitset;
> };
> @@ -829,7 +836,7 @@ static int futex_wake(u32 __user *uaddr, int 
> fshared, int nr_wake, u32 bitset)
> 
>     plist_for_each_entry_safe(this, next, head, list) {
>         if (match_futex (&this->key, &key)) {
> -            if (this->pi_state) {
> +            if (this->pi_state || this->rt_waiter) {
>                 ret = -EINVAL;
>                 break;
>             }
> @@ -970,20 +977,116 @@ void requeue_futex(struct futex_q *q, struct 
> futex_hash_bucket *hb1,
>     q->key = *key2;
> }
> 
> -/*
> - * Requeue all waiters hashed on one physical page to another
> - * physical page.
> +/**
> + * futex_proxy_trylock_atomic() - Attempt an atomic lock for the top 
> waiter
> + * @pifutex:    the user address of the to futex
> + * @hb1:    the from futex hash bucket, must be locked by the caller
> + * @hb2:    the to futex hash bucket, must be locked by the caller
> + * @key1:    the from futex key
> + * @key2:    the to futex key
> + *
> + * Try and get the lock on behalf of the top waiter if we can do it 
> atomically.
> + * Wake the top waiter if we succeed.  hb1 and hb2 must be held by the 
> caller.
> + *
> + * Faults occur for two primary reasons at this point:
> + * 1) The address isn't mapped
> + * 2) The address isn't writeable
> + *
> + * We return EFAULT on either of these cases and rely on the caller to 
> handle
> + * them.
> + *
> + * Returns:
> + *  0 - failed to acquire the lock atomicly
> + *  1 - acquired the lock
> + * <0 - error
> + */
> +static int futex_proxy_trylock_atomic(u32 __user *pifutex,
> +                 struct futex_hash_bucket *hb1,
> +                 struct futex_hash_bucket *hb2,
> +                 union futex_key *key1, union futex_key *key2,
> +                 struct futex_pi_state **ps)
> +{
> +    struct futex_q *top_waiter;
> +    u32 curval;
> +    int ret;
> +
> +    if (get_futex_value_locked(&curval, pifutex))
> +        return -EFAULT;
> +
> +    top_waiter = futex_top_waiter(hb1, key1);
> +
> +    /* There are no waiters, nothing for us to do. */
> +    if (!top_waiter)
> +        return 0;
> +
> +    /*
> +     * Either take the lock for top_waiter or set the FUTEX_WAITERS bit.
> +     * The pi_state is returned in ps in contended cases.
> +     */
> +    ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task);
> +    if (ret == 1) {
> +        /*
> +         * Set the top_waiter key for the requeue target futex so the
> +         * waiter can detect the wakeup on the right futex, but remove
> +         * it from the hb so it can detect atomic lock acquisition.
> +         */
> +        drop_futex_key_refs(&top_waiter->key);
> +        get_futex_key_refs(key2);
> +        top_waiter->key = *key2;
> +        WARN_ON(plist_node_empty(&top_waiter->list));
> +        plist_del(&top_waiter->list, &top_waiter->list.plist);
> +        /*
> +         * FIXME: wake_futex() wakes first, then nulls the lock_ptr,
> +         * and uses a memory barrier.  Do we need to?
> +         */
> +        top_waiter->lock_ptr = NULL;
> +        wake_up(&top_waiter->waiter);
> +    }
> +
> +    return ret;
> +}
> +
> +/**
> + * futex_requeue() - Requeue waiters from uaddr1 to uaddr2
> + * uaddr1:    source futex user address
> + * uaddr2:    target futex user address
> + * nr_wake:    number of waiters to wake (must be 1 for requeue_pi)
> + * nr_requeue:    number of waiters to requeue (0-INT_MAX)
> + * requeue_pi:    if we are attempting to requeue from a non-pi futex to a
> + *         pi futex (pi to pi requeue is not supported)
> + *
> + * Requeue waiters on uaddr1 to uaddr2. In the requeue_pi case, try to 
> acquire
> + * uaddr2 atomically on behalf of the top waiter.
> + *
> + * Returns:
> + * >=0: on success, the number of tasks requeued or woken
> + *  <0: on error
>  */
> static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user 
> *uaddr2,
> -             int nr_wake, int nr_requeue, u32 *cmpval)
> +             int nr_wake, int nr_requeue, u32 *cmpval,
> +             int requeue_pi)
> {
>     union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
> +    int drop_count = 0, task_count = 0, ret;
> +    struct futex_pi_state *pi_state = NULL;
>     struct futex_hash_bucket *hb1, *hb2;
>     struct plist_head *head1;
>     struct futex_q *this, *next;
> -    int ret, drop_count = 0;
> +    u32 curval2;
> +
> +    if (requeue_pi) {
> +        if (refill_pi_state_cache())
> +            return -ENOMEM;
> +        if (nr_wake != 1)
> +            return -EINVAL;
> +    }
> 
> retry:
> +    if (pi_state != NULL) {
> +        free_pi_state(pi_state);
> +        pi_state = NULL;
> +    }
> +
>     ret = get_futex_key(uaddr1, fshared, &key1);
>     if (unlikely(ret != 0))
>         goto out;
> @@ -1022,19 +1125,92 @@ retry_private:
>         }
>     }
> 
> +    if (requeue_pi) {
> +        /* Attempt to acquire uaddr2 and wake the top_waiter. */
> +        ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
> +                         &key2, &pi_state);
> +
> +        /*
> +         * At this point the top_waiter has either taken uaddr2 or is
> +         * waiting on it.  If the former, then the pi_state will not
> +         * exist yet, look it up one more time to ensure we have a
> +         * reference to it.
> +         */
> +        if (ret == 1 && !pi_state) {
> +            task_count++;
> +            ret = get_futex_value_locked(&curval2, uaddr2);
> +            if (!ret)
> +                ret = lookup_pi_state(curval2, hb2, &key2,
> +                              &pi_state);
> +        }
> +
> +        switch (ret) {
> +        case 0:
> +            break;
> +        case -EFAULT:
> +            double_unlock_hb(hb1, hb2);
> +            put_futex_key(fshared, &key2);
> +            put_futex_key(fshared, &key1);
> +            ret = get_user(curval2, uaddr2);
> +            if (!ret)
> +                goto retry;
> +            goto out;
> +        case -EAGAIN:
> +            /* The owner was exiting, try again. */
> +            double_unlock_hb(hb1, hb2);
> +            put_futex_key(fshared, &key2);
> +            put_futex_key(fshared, &key1);
> +            cond_resched();
> +            goto retry;
> +        default:
> +            goto out_unlock;
> +        }
> +    }
> +
>     head1 = &hb1->chain;
>     plist_for_each_entry_safe(this, next, head1, list) {
> -        if (!match_futex (&this->key, &key1))
> +        if (task_count - nr_wake >= nr_requeue)
> +            break;
> +
> +        if (!match_futex(&this->key, &key1))
>             continue;
> -        if (++ret <= nr_wake) {
> +
> +        /* This can go after we're satisfied with testing. */
> +        if (!requeue_pi)
> +            WARN_ON(this->rt_waiter);
> +
> +        /*
> +         * Wake nr_wake waiters.  For requeue_pi, if we acquired the
> +         * lock, we already woke the top_waiter.  If not, it will be
> +         * woken by futex_unlock_pi().
> +         */
> +        if (++task_count <= nr_wake && !requeue_pi) {
>             wake_futex(this);
> -        } else {
> -            requeue_futex(this, hb1, hb2, &key2);
> -            drop_count++;
> +            continue;
> +        }
> 
> -            if (ret - nr_wake >= nr_requeue)
> -                break;
> +        /*
> +         * Requeue nr_requeue waiters and possibly one more in the case
> +         * of requeue_pi if we couldn't acquire the lock atomically.
> +         */
> +        if (requeue_pi) {
> +            /* This can go after we're satisfied with testing. */
> +            WARN_ON(!this->rt_waiter);
> +
> +            /* Prepare the waiter to take the rt_mutex. */
> +            atomic_inc(&pi_state->refcount);
> +            this->pi_state = pi_state;
> +            ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
> +                            this->rt_waiter,
> +                            this->task, 1);
> +            if (ret) {
> +                this->pi_state = NULL;
> +                free_pi_state(pi_state);
> +                goto out_unlock;
> +            }
>         }
> +        requeue_futex(this, hb1, hb2, &key2);
> +        drop_count++;
>     }
> 
> out_unlock:
> @@ -1049,7 +1225,9 @@ out_put_keys:
> out_put_key1:
>     put_futex_key(fshared, &key1);
> out:
> -    return ret;
> +    if (pi_state != NULL)
> +        free_pi_state(pi_state);
> +    return ret ? ret : task_count;
> }
> 
> /* The key must be already stored in q->key. */
> @@ -1272,6 +1450,8 @@ handle_fault:
> #define FLAGS_HAS_TIMEOUT    0x04
> 
> static long futex_wait_restart(struct restart_block *restart);
> +static long futex_wait_requeue_pi_restart(struct restart_block *restart);
> +static long futex_lock_pi_restart(struct restart_block *restart);
> 
> /**
>  * finish_futex_lock_pi() - Post lock pi_state and corner case management
> @@ -1419,6 +1599,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
> 
>     q.pi_state = NULL;
>     q.bitset = bitset;
> +    q.rt_waiter = NULL;
> 
>     if (abs_time) {
>         unsigned long slack;
> @@ -1575,6 +1756,7 @@ static int futex_lock_pi(u32 __user *uaddr, int 
> fshared,
>     }
> 
>     q.pi_state = NULL;
> +    q.rt_waiter = NULL;
> retry:
>     q.key = FUTEX_KEY_INIT;
>     ret = get_futex_key(uaddr, fshared, &q.key);
> @@ -1670,6 +1852,20 @@ uaddr_faulted:
>     goto retry;
> }
> 
> +static long futex_lock_pi_restart(struct restart_block *restart)
> +{
> +    u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
> +    ktime_t t, *tp = NULL;
> +    int fshared = restart->futex.flags & FLAGS_SHARED;
> +
> +    if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
> +        t.tv64 = restart->futex.time;
> +        tp = &t;
> +    }
> +    restart->fn = do_no_restart_syscall;
> +
> +    return (long)futex_lock_pi(uaddr, fshared, restart->futex.val, tp, 0);
> +}
> 
> /*
>  * Userspace attempted a TID -> 0 atomic transition, and failed.
> @@ -1772,6 +1968,290 @@ pi_faulted:
>     return ret;
> }
> 
> +/**
> + * futex_wait_requeue_pi() - Wait on uaddr and take uaddr2
> + * @uaddr:    the futex we initialyl wait on (non-pi)
> + * @fshared:    whether the futexes are shared (1) or not (0).  They 
> must be
> + *         the same type, no requeueing from private to shared, etc.
> + * @val:    the expected value of uaddr
> + * @abs_time:    absolute timeout
> + * @bitset:    32 bit wakeup bitset set by userspace, defaults to all.
> + * @clockrt:    whether to use CLOCK_REALTIME (1) or CLOCK_MONOTONIC (0)
> + * @uaddr2:    the pi futex we will take prior to returning to user-space
> + *
> + * The caller will wait on uaddr and will be requeued by 
> futex_requeue() to
> + * uaddr2 which must be PI aware.  Normal wakeup will wake on uaddr2 and
> + * complete the acquisition of the rt_mutex prior to returning to 
> userspace.
> + * This ensures the rt_mutex maintains an owner when it has waiters; 
> without
> + * one, the pi logic wouldn't know which task to boost/deboost, if 
> there was a
> + * need to.
> + *
> + * We call schedule in futex_wait_queue_me() when we enqueue and return 
> there
> + * via the following:
> + * 1) wakeup on uaddr2 after an atomic lock acquisition by futex_requeue()
> + * 2) wakeup on uaddr2 after a requeue and subsequent unlock
> + * 3) signal (before or after requeue)
> + * 4) timeout (before or after requeue)
> + *
> + * If 3, we setup a restart_block with futex_wait_requeue_pi() as the 
> function.
> + *
> + * If 2, we may then block on trying to take the rt_mutex and return via:
> + * 5) successful lock
> + * 6) signal
> + * 7) timeout
> + * 8) other lock acquisition failure
> + *
> + * If 6, we setup a restart_block with futex_lock_pi() as the function.
> + *
> + * If 4 or 7, we cleanup and return with -ETIMEDOUT.
> + *
> + * Returns:
> + *  0 - On success
> + * <0 - On error
> + */
> +static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
> +                 u32 val, ktime_t *abs_time, u32 bitset,
> +                 int clockrt, u32 __user *uaddr2)
> +{
> +    struct hrtimer_sleeper timeout, *to = NULL;
> +    struct rt_mutex_waiter rt_waiter;
> +    struct restart_block *restart;
> +    struct futex_hash_bucket *hb;
> +    struct rt_mutex *pi_mutex;
> +    union futex_key key2;
> +    struct futex_q q;
> +    u32 uval;
> +    int ret;
> +
> +    if (!bitset)
> +        return -EINVAL;
> +
> +    if (abs_time) {
> +        to = &timeout;
> +        hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME :
> +                      CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> +        hrtimer_init_sleeper(to, current);
> +        hrtimer_set_expires_range_ns(&to->timer, *abs_time,
> +                         current->timer_slack_ns);
> +    }
> +
> +    /*
> +     * The waiter is allocated on our stack, manipulated by the requeue
> +     * code while we sleep on uaddr.
> +     */
> +    debug_rt_mutex_init_waiter(&rt_waiter);
> +    rt_waiter.task = NULL;
> +
> +    q.pi_state = NULL;
> +    q.bitset = bitset;
> +    q.rt_waiter = &rt_waiter;
> +
> +retry:
> +    q.key = FUTEX_KEY_INIT;
> +    ret = get_futex_key(uaddr, fshared, &q.key);
> +    if (unlikely(ret != 0))
> +        goto out;
> +
> +    key2 = FUTEX_KEY_INIT;
> +    ret = get_futex_key(uaddr2, fshared, &key2);
> +    if (unlikely(ret != 0)) {
> +        put_futex_key(fshared, &q.key);
> +        goto out;
> +    }
> +
> +    hb = queue_lock(&q);
> +
> +    /*
> +     * Access the page AFTER the hash-bucket is locked.
> +     * Order is important:
> +     *
> +     *   Userspace waiter: val = var; if (cond(val)) futex_wait(&var, 
> val);
> +     *   Userspace waker:  if (cond(var)) { var = new; futex_wake(&var); }
> +     *
> +     * The basic logical guarantee of a futex is that it blocks ONLY
> +     * if cond(var) is known to be true at the time of blocking, for
> +     * any cond.  If we queued after testing *uaddr, that would open
> +     * a race condition where we could block indefinitely with
> +     * cond(var) false, which would violate the guarantee.
> +     *
> +     * A consequence is that futex_wait() can return zero and absorb
> +     * a wakeup when *uaddr != val on entry to the syscall.  This is
> +     * rare, but normal.
> +     */
> +    ret = get_futex_value_locked(&uval, uaddr);
> +
> +    if (unlikely(ret)) {
> +        queue_unlock(&q, hb);
> +        put_futex_key(fshared, &q.key);
> +        put_futex_key(fshared, &key2);
> +
> +        ret = get_user(uval, uaddr);
> +        if (!ret)
> +            goto retry;
> +        goto out;
> +    }
> +
> +    /* Only actually queue if *uaddr contained val.  */
> +    ret = -EWOULDBLOCK;
> +    if (uval != val) {
> +        queue_unlock(&q, hb);
> +        put_futex_key(fshared, &q.key);
> +        put_futex_key(fshared, &key2);
> +        goto out;
> +    }
> +
> +    /* Queue the futex_q, drop the hb lock, wait for wakeup. */
> +    futex_wait_queue_me(hb, &q, to);
> +
> +    /*
> +     * Ensure the requeue is atomic to avoid races while we process the
> +     * wakeup.  We only need to hold hb->lock to ensure atomicity as the
> +     * wakeup code can't change q.key from uaddr to uaddr2 if we hold that
> +     * lock. It can't be requeued from uaddr2 to something else since we
> +     * don't support a PI aware source futex for requeue.
> +     */
> +    spin_lock(&hb->lock);
> +    if (!match_futex(&q.key, &key2)) {
> +        WARN_ON(q.lock_ptr && (&hb->lock != q.lock_ptr));
> +        /*
> +         * We were not requeued, handle wakeup from futex1 (uaddr).  We
> +         * cannot have been unqueued and already hold the lock, no need
> +         * to call unqueue_me, just do it directly.
> +         */
> +        plist_del(&q.list, &q.list.plist);
> +        drop_futex_key_refs(&q.key);
> +
> +        ret = -ETIMEDOUT;
> +        if (to && !to->task) {
> +            spin_unlock(&hb->lock);
> +            goto out_put_keys;
> +        }
> +
> +        /*
> +         * We expect signal_pending(current), but another thread may
> +         * have handled it for us already.
> +         */
> +        ret = -ERESTARTSYS;
> +        if (!abs_time) {
> +            spin_unlock(&hb->lock);
> +            goto out_put_keys;
> +        }
> +
> +        restart = &current_thread_info()->restart_block;
> +        restart->fn = futex_wait_requeue_pi_restart;
> +        restart->futex.uaddr = (u32 *)uaddr;
> +        restart->futex.val = val;
> +        restart->futex.time = abs_time->tv64;
> +        restart->futex.bitset = bitset;
> +        restart->futex.flags = 0;
> +        restart->futex.uaddr2 = (u32 *)uaddr2;
> +        restart->futex.flags = FLAGS_HAS_TIMEOUT;
> +
> +        if (fshared)
> +            restart->futex.flags |= FLAGS_SHARED;
> +        if (clockrt)
> +            restart->futex.flags |= FLAGS_CLOCKRT;
> +
> +        ret = -ERESTART_RESTARTBLOCK;
> +
> +        spin_unlock(&hb->lock);
> +        goto out_put_keys;
> +    }
> +    spin_unlock(&hb->lock);
> +
> +    ret = 0;
> +    /*
> +     * Check if the waker acquired the second futex for us. If the 
> lock_ptr
> +     * is NULL, but our key is key2, then the requeue target futex was
> +     * uncontended and the waker gave it to us.  This is safe without a 
> lock
> +     * as futex_requeue() will not release the hb lock until after it's
> +     * nulled the lock_ptr and removed us from the hb.
> +     */
> +    if (!q.lock_ptr)
> +        goto out_put_keys;
> +
> +    /*
> +     * At this point we have been requeued.  We have been woken up by
> +     * futex_unlock_pi(), a timeout, or a signal, but not futex_requeue().
> +     * futex_unlock_pi() will not destroy the lock_ptr nor the pi_state.
> +     */
> +    WARN_ON(!&q.pi_state);
> +    pi_mutex = &q.pi_state->pi_mutex;
> +    ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
> +    debug_rt_mutex_free_waiter(&waiter);
> +
> +    spin_lock(q.lock_ptr);
> +    ret = finish_futex_lock_pi(uaddr, fshared, &q, ret);
> +
> +    /* Unqueue and drop the lock. */
> +    unqueue_me_pi(&q);
> +
> +    /*
> +     * If fixup_pi_state_owner() faulted and was unable to handle the
> +     * fault, unlock it and return the fault to userspace.
> +     */
> +    if (ret == -EFAULT) {
> +        if (rt_mutex_owner(pi_mutex) == current)
> +            rt_mutex_unlock(pi_mutex);
> +    } else if (ret == -EINTR) {
> +        if (get_user(uval, uaddr2)) {
> +            ret = -EFAULT;
> +            goto out_put_keys;
> +        }
> +
> +        /*
> +         * We've already been requeued, so restart by calling
> +         * futex_lock_pi() directly, rather then returning to this
> +         * function.
> +         */
> +        restart = &current_thread_info()->restart_block;
> +        restart->fn = futex_lock_pi_restart;
> +        restart->futex.uaddr = (u32 *)uaddr2;
> +        restart->futex.val = uval;
> +        restart->futex.flags = 0;
> +        if (abs_time) {
> +            restart->futex.flags |= FLAGS_HAS_TIMEOUT;
> +            restart->futex.time = abs_time->tv64;
> +        }
> +
> +        if (fshared)
> +            restart->futex.flags |= FLAGS_SHARED;
> +        if (clockrt)
> +            restart->futex.flags |= FLAGS_CLOCKRT;
> +        ret = -ERESTART_RESTARTBLOCK;
> +    }
> +
> +out_put_keys:
> +    put_futex_key(fshared, &q.key);
> +    put_futex_key(fshared, &key2);
> +
> +out:
> +    if (to) {
> +        hrtimer_cancel(&to->timer);
> +        destroy_hrtimer_on_stack(&to->timer);
> +    }
> +    return ret;
> +}
> +
> +static long futex_wait_requeue_pi_restart(struct restart_block *restart)
> +{
> +    u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
> +    u32 __user *uaddr2 = (u32 __user *)restart->futex.uaddr2;
> +    int fshared = restart->futex.flags & FLAGS_SHARED;
> +    int clockrt = restart->futex.flags & FLAGS_CLOCKRT;
> +    ktime_t t, *tp = NULL;
> +
> +    if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
> +        t.tv64 = restart->futex.time;
> +        tp = &t;
> +    }
> +    restart->fn = do_no_restart_syscall;
> +
> +    return (long)futex_wait_requeue_pi(uaddr, fshared, restart->futex.val,
> +                       tp, restart->futex.bitset, clockrt,
> +                       uaddr2);
> +}
> +
> /*
>  * Support for robust futexes: the kernel cleans up held futexes at
>  * thread exit time.
> @@ -1994,7 +2474,7 @@ long do_futex(u32 __user *uaddr, int op, u32 val, 
> ktime_t *timeout,
>         fshared = 1;
> 
>     clockrt = op & FUTEX_CLOCK_REALTIME;
> -    if (clockrt && cmd != FUTEX_WAIT_BITSET)
> +    if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != 
> FUTEX_WAIT_REQUEUE_PI)
>         return -ENOSYS;
> 
>     switch (cmd) {
> @@ -2009,10 +2489,11 @@ long do_futex(u32 __user *uaddr, int op, u32 
> val, ktime_t *timeout,
>         ret = futex_wake(uaddr, fshared, val, val3);
>         break;
>     case FUTEX_REQUEUE:
> -        ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL);
> +        ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0);
>         break;
>     case FUTEX_CMP_REQUEUE:
> -        ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3);
> +        ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
> +                    0);
>         break;
>     case FUTEX_WAKE_OP:
>         ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
> @@ -2029,6 +2510,18 @@ long do_futex(u32 __user *uaddr, int op, u32 val, 
> ktime_t *timeout,
>         if (futex_cmpxchg_enabled)
>             ret = futex_lock_pi(uaddr, fshared, 0, timeout, 1);
>         break;
> +    case FUTEX_WAIT_REQUEUE_PI:
> +        val3 = FUTEX_BITSET_MATCH_ANY;
> +        ret = futex_wait_requeue_pi(uaddr, fshared, val, timeout, val3,
> +                        clockrt, uaddr2);
> +        break;
> +    case FUTEX_REQUEUE_PI:
> +        ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 1);
> +        break;
> +    case FUTEX_CMP_REQUEUE_PI:
> +        ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
> +                    1);
> +        break;
>     default:
>         ret = -ENOSYS;
>     }
> @@ -2046,7 +2539,8 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, 
> op, u32, val,
>     int cmd = op & FUTEX_CMD_MASK;
> 
>     if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
> -              cmd == FUTEX_WAIT_BITSET)) {
> +              cmd == FUTEX_WAIT_BITSET ||
> +              cmd == FUTEX_WAIT_REQUEUE_PI)) {
>         if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
>             return -EFAULT;
>         if (!timespec_valid(&ts))
> @@ -2058,10 +2552,11 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, 
> op, u32, val,
>         tp = &t;
>     }
>     /*
> -     * requeue parameter in 'utime' if cmd == FUTEX_REQUEUE.
> +     * requeue parameter in 'utime' if cmd == FUTEX_*_REQUEUE_*.
>      * number of waiters to wake in 'utime' if cmd == FUTEX_WAKE_OP.
>      */
>     if (cmd == FUTEX_REQUEUE || cmd == FUTEX_CMP_REQUEUE ||
> +        cmd == FUTEX_REQUEUE_PI || cmd == FUTEX_CMP_REQUEUE_PI ||
>         cmd == FUTEX_WAKE_OP)
>         val2 = (u32) (unsigned long) utime;
> 
> 
> 


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 0/8] requeue pi implementation
  2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
                   ` (8 preceding siblings ...)
  2009-03-30 21:55 ` glibc hacks for requeue_pi Darren Hart
@ 2009-03-31  2:09 ` Steven Rostedt
  2009-03-31  4:48   ` Darren Hart
  9 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2009-03-31  2:09 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Thomas Gleixner, Sripathi Kodi, Peter Zijlstra,
	John Stultz, Dinakar Guniguntala, Ulrich Drepper, Eric Dumazet,
	Ingo Molnar, Jakub Jelinek




On Mon, 30 Mar 2009, Darren Hart wrote:
> 
> Darren Hart (8):
>       RFC: futex: add requeue_pi calls
>       RFC: futex: Add requeue_futex() call
>       RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags

I just finished reviewing the first three patches, and I don't see 
anything wrong with them. I'll review more tomorrow, but just for now, you 
can add on to the first three:

Reviewed-by: Steven Rostedt <srostedt@redhat.com>

-- Steve


>       RFC: rt_mutex: add proxy lock routines
>       RFC: futex: finish_futex_lock_pi()
>       RFC: futex: futex_lock_pi_atomic()
>       RFC: futex: futex_top_waiter()
>       RFC: futex: futex_wait_queue_me()
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 0/8] requeue pi implementation
  2009-03-31  2:09 ` [tip PATCH v6 0/8] requeue pi implementation Steven Rostedt
@ 2009-03-31  4:48   ` Darren Hart
  0 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-31  4:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-kernel, Thomas Gleixner, Sripathi Kodi, Peter Zijlstra,
	John Stultz, Dinakar Guniguntala, Ulrich Drepper, Eric Dumazet,
	Ingo Molnar, Jakub Jelinek

Steven Rostedt wrote:
> 
> 
> On Mon, 30 Mar 2009, Darren Hart wrote:
>> Darren Hart (8):
>>       RFC: futex: add requeue_pi calls
>>       RFC: futex: Add requeue_futex() call
>>       RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags
> 
> I just finished reviewing the first three patches, and I don't see 
> anything wrong with them. I'll review more tomorrow, but just for now, you 
> can add on to the first three:
> 
> Reviewed-by: Steven Rostedt <srostedt@redhat.com>

Thanks Steven!

I presume you mean the FIRST three, as in 0/8, 1/8, and 2/8 as opposed 
to the top three above which would be 6/8, 7/8, and 8/8 ?

> 
> -- Steve
> 
> 
>>       RFC: rt_mutex: add proxy lock routines
>>       RFC: futex: finish_futex_lock_pi()
>>       RFC: futex: futex_lock_pi_atomic()
>>       RFC: futex: futex_top_waiter()
>>       RFC: futex: futex_wait_queue_me()
>>


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 1/8] RFC: futex: futex_wait_queue_me()
  2009-03-30 21:37 ` [tip PATCH v6 1/8] RFC: futex: futex_wait_queue_me() Darren Hart
@ 2009-03-31  6:44   ` Thomas Gleixner
  2009-03-31 14:58     ` Darren Hart
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2009-03-31  6:44 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Sripathi Kodi,
	Peter Zijlstra, John Stultz, Steven Rostedt, Dinakar Guniguntala,
	Ulrich Drepper, Eric Dumazet, Ingo Molnar, Jakub Jelinek

On Mon, 30 Mar 2009, Darren Hart wrote:
> +
> +	/* add_wait_queue is the barrier after __set_current_state. */
> +	__set_current_state(TASK_INTERRUPTIBLE);
> +	add_wait_queue(&q->waiter, &wait);
> +	/*
> +	 * NOTE: we don't remove ourselves from the waitqueue because
> +	 * we are the only user of it.
> +	 */

  This comment, while correct is at an odd place.

> +	/* Arm the timer */
> +	if (timeout) {
> +		hrtimer_start_expires(&timeout->timer, HRTIMER_MODE_ABS);
> +		if (!hrtimer_active(&timeout->timer))
> +			timeout->task = NULL;
> +	}
> +
> +	/*
> +	 * !plist_node_empty() is safe here without any lock.
> +	 * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
> +	 */
> +	if (likely(!plist_node_empty(&q->list))) {
> +		/*
> +		 * If the timer has already expired, current will already be
> +		 * flagged for rescheduling. Only call schedule if there
> +		 * is no timeout, or if it has yet to expire.
> +		 */
> +		if (!timeout || likely(timeout->task))

  Remove the likely(). It does not make sense

> +			schedule();
> +	}
> +	__set_current_state(TASK_RUNNING);
> +}
> +
>  static int futex_wait(u32 __user *uaddr, int fshared,
>  		      u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
>  {
> -	struct task_struct *curr = current;
> +	struct hrtimer_sleeper timeout, *to = NULL;
>  	struct restart_block *restart;
> -	DECLARE_WAITQUEUE(wait, curr);
>  	struct futex_hash_bucket *hb;
>  	struct futex_q q;
>  	u32 uval;
>  	int ret;
> -	struct hrtimer_sleeper t;
> -	int rem = 0;
>  
>  	if (!bitset)
>  		return -EINVAL;
>  
>  	q.pi_state = NULL;
>  	q.bitset = bitset;
> +
> +	if (abs_time) {
> +		unsigned long slack;

  missing new line

> +		to = &timeout;
> +		slack = current->timer_slack_ns;
> +		if (rt_task(current))
> +			slack = 0;

  Hmm. I thought we would use current->timer_slack_ns ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 3/8] RFC: futex: futex_lock_pi_atomic()
  2009-03-30 21:37 ` [tip PATCH v6 3/8] RFC: futex: futex_lock_pi_atomic() Darren Hart
@ 2009-03-31  6:49   ` Thomas Gleixner
  2009-03-31 15:00     ` Darren Hart
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2009-03-31  6:49 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Sripathi Kodi,
	Peter Zijlstra, John Stultz, Steven Rostedt, Dinakar Guniguntala,
	Ulrich Drepper, Eric Dumazet, Ingo Molnar, Jakub Jelinek

On Mon, 30 Mar 2009, Darren Hart wrote:
> +static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
> +				union futex_key *key,
> +				struct futex_pi_state **ps,
> +				struct task_struct *task)
> +{
> +	u32 uval, newval, curval;
> +	int lock_taken;
> +	int ret;
> +	int ownerdied = 0;

  int lock_taken, ret, ownerdied = 0;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls
  2009-03-30 21:38 ` [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls Darren Hart
  2009-03-30 21:47   ` Eric Dumazet
@ 2009-03-31  7:30   ` Thomas Gleixner
  2009-03-31 18:16     ` Darren Hart
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2009-03-31  7:30 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek, Sripathi Kodi,
	Peter Zijlstra, John Stultz, Steven Rostedt, Dinakar Guniguntala,
	Ulrich Drepper, Eric Dumazet, Ingo Molnar, Jakub Jelinek

On Mon, 30 Mar 2009, Darren Hart wrote:
> +
> +	if (requeue_pi) {
> +		if (refill_pi_state_cache())
> +			return -ENOMEM;
> +		if (nr_wake != 1)
> +			return -EINVAL;
> +	}

  Needs a comment
  
>  retry:
> +	if (pi_state != NULL) {
> +		free_pi_state(pi_state);
> +		pi_state = NULL;
> +	}

  Ditto

> +	if (requeue_pi) {
> +		/* Attempt to acquire uaddr2 and wake the top_waiter. */
> +		ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
> +						 &key2, &pi_state);
> +
> +		/*
> +		 * At this point the top_waiter has either taken uaddr2 or is
> +		 * waiting on it.  If the former, then the pi_state will not
> +		 * exist yet, look it up one more time to ensure we have a
> +		 * reference to it.
> +		 */
> +		if (ret == 1 && !pi_state) {
> +			task_count++;
> +			ret = get_futex_value_locked(&curval2, uaddr2);
> +			if (!ret)
> +				ret = lookup_pi_state(curval2, hb2, &key2,
> +						      &pi_state);
> +		}
> +
> +		switch (ret) {
> +		case 0:
> +			break;
> +		case -EFAULT:
> +			double_unlock_hb(hb1, hb2);
> +			put_futex_key(fshared, &key2);
> +			put_futex_key(fshared, &key1);
> +			ret = get_user(curval2, uaddr2);
> +			if (!ret)
> +				goto retry;

  Hmm. What happens if futex_proxy_trylock_atomic() succeeds, but
  get_futex_value_locked() fails ? I guess its unlikely to happen, but
  in case it does are we sure that our state is correct ?

> +
> +		/* This can go after we're satisfied with testing. */
> +		if (!requeue_pi)
> +			WARN_ON(this->rt_waiter);

  Keep those WARN_ONs. they are useful when code is changed later. Just make it
  
	WARN_ON(!requeue_pi && this->rt_waiter);
	WARN_ON(requeue_pi && !this->rt_waiter);

> +		/*
> +		 * Requeue nr_requeue waiters and possibly one more in the case
> +		 * of requeue_pi if we couldn't acquire the lock atomically.
> +		 */
> +		if (requeue_pi) {
> +			/* This can go after we're satisfied with testing. */
> +			WARN_ON(!this->rt_waiter);
> +
> +			/* Prepare the waiter to take the rt_mutex. */
> +			atomic_inc(&pi_state->refcount);
> +			this->pi_state = pi_state;
> +			ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
> +							this->rt_waiter,
> +							this->task, 1);
> +			if (ret) {
> +				this->pi_state = NULL;
> +				free_pi_state(pi_state);
> +				goto out_unlock;

  What are the reasons of dropping out here ?

> +	/*
> +	 * Ensure the requeue is atomic to avoid races while we process the
> +	 * wakeup.  We only need to hold hb->lock to ensure atomicity as the
> +	 * wakeup code can't change q.key from uaddr to uaddr2 if we hold that
> +	 * lock. It can't be requeued from uaddr2 to something else since we
> +	 * don't support a PI aware source futex for requeue.
> +	 */
> +	spin_lock(&hb->lock);
> +	if (!match_futex(&q.key, &key2)) {
> +		WARN_ON(q.lock_ptr && (&hb->lock != q.lock_ptr));
> +		/*
> +		 * We were not requeued, handle wakeup from futex1 (uaddr).  We
> +		 * cannot have been unqueued and already hold the lock, no need
> +		 * to call unqueue_me, just do it directly.
> +		 */
> +		plist_del(&q.list, &q.list.plist);
> +		drop_futex_key_refs(&q.key);
> +
> +		ret = -ETIMEDOUT;
> +		if (to && !to->task) {
> +			spin_unlock(&hb->lock);
> +			goto out_put_keys;
> +		}
> +
> +		/*
> +		 * We expect signal_pending(current), but another thread may
> +		 * have handled it for us already.
> +		 */
> +		ret = -ERESTARTSYS;
> +		if (!abs_time) {
> +			spin_unlock(&hb->lock);
> +			goto out_put_keys;
> +		}

  Hmm. Why do we use restart block here ? The timeout is absolute, so
  we can just restart the syscall, can't we ?

> +	ret = 0;
> +	/*
> +	 * Check if the waker acquired the second futex for us. If the lock_ptr
> +	 * is NULL, but our key is key2, then the requeue target futex was
> +	 * uncontended and the waker gave it to us.  This is safe without a lock
> +	 * as futex_requeue() will not release the hb lock until after it's
> +	 * nulled the lock_ptr and removed us from the hb.
> +	 */
> +	if (!q.lock_ptr)
> +		goto out_put_keys;

  At this point we hold the rt_mutex, the pi state is correct and the
  user space variable at *uaddr2 is updated, right ??

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 1/8] RFC: futex: futex_wait_queue_me()
  2009-03-31  6:44   ` Thomas Gleixner
@ 2009-03-31 14:58     ` Darren Hart
  0 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-31 14:58 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

Thomas Gleixner wrote:
> On Mon, 30 Mar 2009, Darren Hart wrote:
>> +
>> +	/* add_wait_queue is the barrier after __set_current_state. */
>> +	__set_current_state(TASK_INTERRUPTIBLE);
>> +	add_wait_queue(&q->waiter, &wait);
>> +	/*
>> +	 * NOTE: we don't remove ourselves from the waitqueue because
>> +	 * we are the only user of it.
>> +	 */
> 
>   This comment, while correct is at an odd place.

How about something like this:

	/* add_wait_queue is the barrier after __set_current_state. */
	__set_current_state(TASK_INTERRUPTIBLE);

	/*
	 * Add current as the futex_q waiter.  We don't remove ourselves from
	 * the wait_queue because we are the only user of it.
	 */
	add_wait_queue(&q->waiter, &wait);


> 
>> +	/* Arm the timer */
>> +	if (timeout) {
>> +		hrtimer_start_expires(&timeout->timer, HRTIMER_MODE_ABS);
>> +		if (!hrtimer_active(&timeout->timer))
>> +			timeout->task = NULL;
>> +	}
>> +
>> +	/*
>> +	 * !plist_node_empty() is safe here without any lock.
>> +	 * q.lock_ptr != 0 is not safe, because of ordering against wakeup.
>> +	 */
>> +	if (likely(!plist_node_empty(&q->list))) {
>> +		/*
>> +		 * If the timer has already expired, current will already be
>> +		 * flagged for rescheduling. Only call schedule if there
>> +		 * is no timeout, or if it has yet to expire.
>> +		 */
>> +		if (!timeout || likely(timeout->task))
> 
>   Remove the likely(). It does not make sense

Done.

> 
>> +			schedule();
>> +	}
>> +	__set_current_state(TASK_RUNNING);
>> +}
>> +
>>  static int futex_wait(u32 __user *uaddr, int fshared,
>>  		      u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
>>  {
>> -	struct task_struct *curr = current;
>> +	struct hrtimer_sleeper timeout, *to = NULL;
>>  	struct restart_block *restart;
>> -	DECLARE_WAITQUEUE(wait, curr);
>>  	struct futex_hash_bucket *hb;
>>  	struct futex_q q;
>>  	u32 uval;
>>  	int ret;
>> -	struct hrtimer_sleeper t;
>> -	int rem = 0;
>>  
>>  	if (!bitset)
>>  		return -EINVAL;
>>  
>>  	q.pi_state = NULL;
>>  	q.bitset = bitset;
>> +
>> +	if (abs_time) {
>> +		unsigned long slack;
> 
>   missing new line
> 
>> +		to = &timeout;
>> +		slack = current->timer_slack_ns;
>> +		if (rt_task(current))
>> +			slack = 0;
> 
>   Hmm. I thought we would use current->timer_slack_ns ?

Hrm, right, so long as I'm changing futex_wait I might as well correct this.  Done (and voids the missing new line comment above).

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 3/8] RFC: futex: futex_lock_pi_atomic()
  2009-03-31  6:49   ` Thomas Gleixner
@ 2009-03-31 15:00     ` Darren Hart
  0 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-31 15:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

Thomas Gleixner wrote:
> On Mon, 30 Mar 2009, Darren Hart wrote:
>> +static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
>> +				union futex_key *key,
>> +				struct futex_pi_state **ps,
>> +				struct task_struct *task)
>> +{
>> +	u32 uval, newval, curval;
>> +	int lock_taken;
>> +	int ret;
>> +	int ownerdied = 0;
> 
>   int lock_taken, ret, ownerdied = 0;
>

Fixed, and reordered to meet preferred reverse line length order of 
declarations:

	int lock_taken, ret, ownerdied = 0;
	u32 uval, newval, curval;


Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls
  2009-03-31  7:30   ` Thomas Gleixner
@ 2009-03-31 18:16     ` Darren Hart
  0 siblings, 0 replies; 23+ messages in thread
From: Darren Hart @ 2009-03-31 18:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Sripathi Kodi, Peter Zijlstra, John Stultz,
	Steven Rostedt, Dinakar Guniguntala, Ulrich Drepper,
	Eric Dumazet, Ingo Molnar, Jakub Jelinek

Thomas Gleixner wrote:
> On Mon, 30 Mar 2009, Darren Hart wrote:
>> +
>> +	if (requeue_pi) {
>> +		if (refill_pi_state_cache())
>> +			return -ENOMEM;
>> +		if (nr_wake != 1)
>> +			return -EINVAL;
>> +	}
> 
>   Needs a comment

	/*
	 * requeue_pi requires a pi_state, try to allocate it now in case it
	 * fails.  requeue_pi can wake at most one task since it acquires the
	 * rt_mutex prior to returning to userspace, and must wake it if it can
	 * so as to not leave the rt_mutex with waiters and no owner.
	 */

>   
>>  retry:
>> +	if (pi_state != NULL) {
>> +		free_pi_state(pi_state);
>> +		pi_state = NULL;
>> +	}
> 
>   Ditto


	/*
	 * We will have to lookup the pi_state again, so
	 * free this one to keep the accounting correct.
	 */


 
>> +	if (requeue_pi) {
>> +		/* Attempt to acquire uaddr2 and wake the top_waiter. */
>> +		ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
>> +						 &key2, &pi_state);
>> +
>> +		/*
>> +		 * At this point the top_waiter has either taken uaddr2 or is
>> +		 * waiting on it.  If the former, then the pi_state will not
>> +		 * exist yet, look it up one more time to ensure we have a
>> +		 * reference to it.
>> +		 */
>> +		if (ret == 1 && !pi_state) {
>> +			task_count++;
>> +			ret = get_futex_value_locked(&curval2, uaddr2);
>> +			if (!ret)
>> +				ret = lookup_pi_state(curval2, hb2, &key2,
>> +						      &pi_state);
>> +		}
>> +
>> +		switch (ret) {
>> +		case 0:
>> +			break;
>> +		case -EFAULT:
>> +			double_unlock_hb(hb1, hb2);
>> +			put_futex_key(fshared, &key2);
>> +			put_futex_key(fshared, &key1);
>> +			ret = get_user(curval2, uaddr2);
>> +			if (!ret)
>> +				goto retry;
> 
>   Hmm. What happens if futex_proxy_trylock_atomic() succeeds, but
>   get_futex_value_locked() fails ? I guess its unlikely to happen, but
>   in case it does are we sure that our state is correct ?

Right so...

First, I suppose the !pi_state in that if block above should be turned
into a WARN_ON as ret == 1 should imply !pi_state.

If get_futex_value_locked() faults, the uaddr2 still contains the
top_waiter's pid and that task is on it's way waking up.  We unlock hash
buckets which will allow for unlocking (maybe the top_waiter is REALLY
fast) or for additional waiters to be added.  If we fault while not
atomic (get_user() above) then we goto out, returning a fatal fault to
userspace, but the top_waiter will continue to wake.  I think this is
OK.  We can't return both the number woken AND an error code without
passing another userspace variable for the number woken/requeued (which
might not be a bad idea actually).

If we don't fault (most likely) we goto retry which will not free the
pi_state since it will be NULL.  We grab the keys and the hb locks, do
the cmpval test (un uaddr, not uaddr2, so we're still ok).  We'll then
try to take the lock atomically again.  We may actually succeed again if
the top_waiter was really fast and dropped uaddr2 already...

HRM... this means we might wake additional tasks... so for requeue_pi
nr_wake is really ignored... we'll wake however many waiters we can/have
to in order to ensure a contended rt_mutex always has an owner... which
raises the question of is it ok to return >1 when user called with
nr_wake=1 and nr_requeue=0 (for cond_signal)?

If the lock is still contended, the next futex_proxy_trylock_atomic()
will fail and we'll continue to requeue the remaining tasks.

So the only concern is the muddied relevance of nr_wake.... Any thoughts
on how we should treat nr_wake?  I suppose if nr_wake=1 and
nr_requeue=0, then we should avoid calling futex_proxy_trylock_atomic()
a second time... something like this:

	if (requeue_pi && (task_count - nr_wake < nr_requeue)) {
		/* Attempt to acquire uaddr2 and wake the top_waiter. */
		ret = futex_proxy_trylock_atomic(uaddr2, hb1, hb2, &key1,
						 &key2, &pi_state);



> 
>> +
>> +		/* This can go after we're satisfied with testing. */
>> +		if (!requeue_pi)
>> +			WARN_ON(this->rt_waiter);
> 
>   Keep those WARN_ONs. they are useful when code is changed later. Just make it
>   
> 	WARN_ON(!requeue_pi && this->rt_waiter);
> 	WARN_ON(requeue_pi && !this->rt_waiter);

Makes sense, done.

> 
>> +		/*
>> +		 * Requeue nr_requeue waiters and possibly one more in the case
>> +		 * of requeue_pi if we couldn't acquire the lock atomically.
>> +		 */
>> +		if (requeue_pi) {
>> +			/* This can go after we're satisfied with testing. */
>> +			WARN_ON(!this->rt_waiter);
>> +
>> +			/* Prepare the waiter to take the rt_mutex. */
>> +			atomic_inc(&pi_state->refcount);
>> +			this->pi_state = pi_state;
>> +			ret = rt_mutex_start_proxy_lock(&pi_state->pi_mutex,
>> +							this->rt_waiter,
>> +							this->task, 1);
>> +			if (ret) {
>> +				this->pi_state = NULL;
>> +				free_pi_state(pi_state);
>> +				goto out_unlock;
> 
>   What are the reasons of dropping out here ?

Eeek.  Good catch.

if ret == 1, then we need to wake-up the waiter much like we do in
futex_proxy_trylock_atomic().

if ret < 0 (-EDEADLK), I think we should return the error to userspace
immediately, aborting any pending requeues - thoughts?

> 
>> +	/*
>> +	 * Ensure the requeue is atomic to avoid races while we process the
>> +	 * wakeup.  We only need to hold hb->lock to ensure atomicity as the
>> +	 * wakeup code can't change q.key from uaddr to uaddr2 if we hold that
>> +	 * lock. It can't be requeued from uaddr2 to something else since we
>> +	 * don't support a PI aware source futex for requeue.
>> +	 */
>> +	spin_lock(&hb->lock);
>> +	if (!match_futex(&q.key, &key2)) {
>> +		WARN_ON(q.lock_ptr && (&hb->lock != q.lock_ptr));
>> +		/*
>> +		 * We were not requeued, handle wakeup from futex1 (uaddr).  We
>> +		 * cannot have been unqueued and already hold the lock, no need
>> +		 * to call unqueue_me, just do it directly.
>> +		 */
>> +		plist_del(&q.list, &q.list.plist);
>> +		drop_futex_key_refs(&q.key);
>> +
>> +		ret = -ETIMEDOUT;
>> +		if (to && !to->task) {
>> +			spin_unlock(&hb->lock);
>> +			goto out_put_keys;
>> +		}
>> +
>> +		/*
>> +		 * We expect signal_pending(current), but another thread may
>> +		 * have handled it for us already.
>> +		 */
>> +		ret = -ERESTARTSYS;
>> +		if (!abs_time) {
>> +			spin_unlock(&hb->lock);
>> +			goto out_put_keys;
>> +		}
> 
>   Hmm. Why do we use restart block here ? The timeout is absolute, so
>   we can just restart the syscall, can't we ?

Yup, I caught this last night too.  I'll update.

> 
>> +	ret = 0;
>> +	/*
>> +	 * Check if the waker acquired the second futex for us. If the lock_ptr
>> +	 * is NULL, but our key is key2, then the requeue target futex was
>> +	 * uncontended and the waker gave it to us.  This is safe without a lock
>> +	 * as futex_requeue() will not release the hb lock until after it's
>> +	 * nulled the lock_ptr and removed us from the hb.
>> +	 */
>> +	if (!q.lock_ptr)
>> +		goto out_put_keys;
> 
>   At this point we hold the rt_mutex, the pi state is correct and the
>   user space variable at *uaddr2 is updated, right ??

uaddr2 has already been updated by futex_lock_pi_atomic()

we do hold the rt_mutex if there is one (a waiter would have to have
been added after we look the lock and setup the pi_state, creating the
rt_mutex, and assigning us as the owner)

we only get to this point if we acquired the uncontended lock, so there is no lock stealing to worry about, I believe this means we can avoid the corner case cleanup of finish_futex_lock_pi().  I'll pour over this a bit more though and make sure I can convince myself of that.

Thanks you for the review!



> 
> Thanks,
> 
> 	tglx


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-03-31 18:17 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-30 21:37 [tip PATCH v6 0/8] requeue pi implementation Darren Hart
2009-03-30 21:37 ` [tip PATCH v6 1/8] RFC: futex: futex_wait_queue_me() Darren Hart
2009-03-31  6:44   ` Thomas Gleixner
2009-03-31 14:58     ` Darren Hart
2009-03-30 21:37 ` [tip PATCH v6 2/8] RFC: futex: futex_top_waiter() Darren Hart
2009-03-30 21:37 ` [tip PATCH v6 3/8] RFC: futex: futex_lock_pi_atomic() Darren Hart
2009-03-31  6:49   ` Thomas Gleixner
2009-03-31 15:00     ` Darren Hart
2009-03-30 21:38 ` [tip PATCH v6 4/8] RFC: futex: finish_futex_lock_pi() Darren Hart
2009-03-30 21:38 ` [tip PATCH v6 5/8] RFC: rt_mutex: add proxy lock routines Darren Hart
2009-03-30 21:38 ` [tip PATCH v6 6/8] RFC: futex: Add FUTEX_HAS_TIMEOUT flag to restart.futex.flags Darren Hart
2009-03-30 21:40   ` Eric Dumazet
2009-03-30 22:40     ` Darren Hart
2009-03-30 21:38 ` [tip PATCH v6 7/8] RFC: futex: Add requeue_futex() call Darren Hart
2009-03-30 21:38 ` [tip PATCH v6 8/8] RFC: futex: add requeue_pi calls Darren Hart
2009-03-30 21:47   ` Eric Dumazet
2009-03-30 22:44     ` Darren Hart
2009-03-30 23:31       ` Darren Hart
2009-03-31  7:30   ` Thomas Gleixner
2009-03-31 18:16     ` Darren Hart
2009-03-30 21:55 ` glibc hacks for requeue_pi Darren Hart
2009-03-31  2:09 ` [tip PATCH v6 0/8] requeue pi implementation Steven Rostedt
2009-03-31  4:48   ` Darren Hart

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.