[PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
@ 2010-04-05 20:23 Darren Hart
  2010-04-05 20:23 ` [PATCH 1/6] futex: replace fshared and clockrt with combined flags Darren Hart
                   ` (8 more replies)
  0 siblings, 9 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-05 20:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity

In-Reply-To: 

NOT FOR INCLUSION

The following patch series implements a new experimental kernel side futex mutex
via new FUTEX_LOCK and FUTEX_LOCK_ADAPTIVE futex op codes. The adaptive spin
follows the kernel mutex model of allowing one spinner until the lock is
released or the owner is descheduled. The patch currently allows the user to
specify if they want no spinning, a single adaptive spinner, or multiple
spinners (aggressive adaptive spinning, or aas... which I have mistyped as "ass"
enough times to realize a better term is indeed required :-).

I'm using the futex_lock branch of my futextest test suite to gather results.
The test is performance/futex_lock.c and can be checked out here:

git://git.kernel.org/pub/scm/linux/kernel/git/dvhart/futextest.git

Per Avi's suggestion I added asm instruction based period and duty-cycle options
to do some testing. A series of plots with 256 threads, one per period length,
is shown at the URL below. Each plot compares raw futex lock (normal, no
spinning) with a single spinner (adaptive) and with multiple spinners (aas) for
each of several duty-cycles to determine performance at various levels of
contention. The test measure the number of lock/unlock pairs performed per
second (in thousands).

http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v4/

Avi's suggested starting point was 1000 instructions with a 10% duty cycle.
That plot "Futex Lock/Unlock (Period=1000, Threads=256)" does appear to be the
most interesting of the lot. It's not so short that other overhead becomes the
bottleneck, and not so long so as to make adaptive spinning benefits show up
as noise in the numbers. The adaptive spin (with a single waiter) consistently
beats the normal run, and outperforms aas for most duty-cycles. I rechecked a
few points on this plot to confirm and the relative scores remained consistent.
These plots were generated using 10,000,000 iterations per datapoint.

Lastly, I should mention that these results all underperform a simple
pthread_mutex_lock()/pthread_mutex_unlock() pair. I'm looking into why but felt
I was overdue in sharing what I have to date. A test comparing this to a
sched_yield() style userspace spinlock would probably be more appropraite.

Thanks,

Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [PATCH 1/6] futex: replace fshared and clockrt with combined flags
  2010-04-05 20:23 [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
@ 2010-04-05 20:23 ` Darren Hart
  2010-04-05 20:23 ` [PATCH 2/6] futex: add futex_q static initializer Darren Hart
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-05 20:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Darren Hart

In the early days we passed the mmap sem around. That became the
"int fshared" with the fast gup improvements. Then we added
"int clockrt" in places. This patch unifies these options as "flags" and
cleans up various calls which had fshared in their signature but no
longer used it.

V4: Fix an operator precedence bug

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 kernel/futex.c |  215 +++++++++++++++++++++++++++-----------------------------
 1 files changed, 104 insertions(+), 111 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index e7a35f1..ed08cfd 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -69,6 +69,14 @@ int __read_mostly futex_cmpxchg_enabled;
 #define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
 
 /*
+ * Futex flags used to encode options to functions and preserve them across
+ * restarts.
+ */
+#define FLAGS_SHARED		0x01
+#define FLAGS_CLOCKRT		0x02
+#define FLAGS_HAS_TIMEOUT	0x04
+
+/*
  * Priority Inheritance state:
  */
 struct futex_pi_state {
@@ -283,7 +291,7 @@ again:
 }
 
 static inline
-void put_futex_key(int fshared, union futex_key *key)
+void put_futex_key(union futex_key *key)
 {
 	drop_futex_key_refs(key);
 }
@@ -878,7 +886,7 @@ double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2)
 /*
  * Wake up waiters matching bitset queued on this futex (uaddr).
  */
-static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
+static int futex_wake(u32 __user *uaddr, int flags, int nr_wake, u32 bitset)
 {
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
@@ -889,7 +897,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
 	if (!bitset)
 		return -EINVAL;
 
-	ret = get_futex_key(uaddr, fshared, &key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -915,7 +923,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset)
 	}
 
 	spin_unlock(&hb->lock);
-	put_futex_key(fshared, &key);
+	put_futex_key(&key);
 out:
 	return ret;
 }
@@ -925,7 +933,7 @@ out:
  * to this virtual address:
  */
 static int
-futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
+futex_wake_op(u32 __user *uaddr1, int flags, u32 __user *uaddr2,
 	      int nr_wake, int nr_wake2, int op)
 {
 	union futex_key key1 = FUTEX_KEY_INIT, key2 = FUTEX_KEY_INIT;
@@ -935,10 +943,10 @@ futex_wake_op(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
 	int ret, op_ret;
 
 retry:
-	ret = get_futex_key(uaddr1, fshared, &key1);
+	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, fshared, &key2);
+	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
 	if (unlikely(ret != 0))
 		goto out_put_key1;
 
@@ -970,11 +978,11 @@ retry_private:
 		if (ret)
 			goto out_put_keys;
 
-		if (!fshared)
+		if (!(flags & FLAGS_SHARED))
 			goto retry_private;
 
-		put_futex_key(fshared, &key2);
-		put_futex_key(fshared, &key1);
+		put_futex_key(&key2);
+		put_futex_key(&key1);
 		goto retry;
 	}
 
@@ -1004,9 +1012,9 @@ retry_private:
 
 	double_unlock_hb(hb1, hb2);
 out_put_keys:
-	put_futex_key(fshared, &key2);
+	put_futex_key(&key2);
 out_put_key1:
-	put_futex_key(fshared, &key1);
+	put_futex_key(&key1);
 out:
 	return ret;
 }
@@ -1140,11 +1148,13 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
 
 /**
  * futex_requeue() - Requeue waiters from uaddr1 to uaddr2
- * uaddr1:	source futex user address
- * uaddr2:	target futex user address
- * nr_wake:	number of waiters to wake (must be 1 for requeue_pi)
- * nr_requeue:	number of waiters to requeue (0-INT_MAX)
- * requeue_pi:	if we are attempting to requeue from a non-pi futex to a
+ * @uaddr1:	source futex user address
+ * @flags:	futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, etc.), they must be
+ * 		the same type, no requeueing from private to shared, etc.
+ * @uaddr2:	target futex user address
+ * @nr_wake:	number of waiters to wake (must be 1 for requeue_pi)
+ * @nr_requeue:	number of waiters to requeue (0-INT_MAX)
+ * @requeue_pi:	if we are attempting to requeue from a non-pi futex to a
  * 		pi futex (pi to pi requeue is not supported)
  *
  * Requeue waiters on uaddr1 to uaddr2. In the requeue_pi case, try to acquire
@@ -1154,7 +1164,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
  * >=0 - on success, the number of tasks requeued or woken
  *  <0 - on error
  */
-static int futex_requeue(u32 __user *uaddr1, int fshared, u32 __user *uaddr2,
+static int futex_requeue(u32 __user *uaddr1, int flags, u32 __user *uaddr2,
 			 int nr_wake, int nr_requeue, u32 *cmpval,
 			 int requeue_pi)
 {
@@ -1197,10 +1207,10 @@ retry:
 		pi_state = NULL;
 	}
 
-	ret = get_futex_key(uaddr1, fshared, &key1);
+	ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1);
 	if (unlikely(ret != 0))
 		goto out;
-	ret = get_futex_key(uaddr2, fshared, &key2);
+	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
 	if (unlikely(ret != 0))
 		goto out_put_key1;
 
@@ -1222,11 +1232,11 @@ retry_private:
 			if (ret)
 				goto out_put_keys;
 
-			if (!fshared)
+			if (!(flags & FLAGS_SHARED))
 				goto retry_private;
 
-			put_futex_key(fshared, &key2);
-			put_futex_key(fshared, &key1);
+			put_futex_key(&key2);
+			put_futex_key(&key1);
 			goto retry;
 		}
 		if (curval != *cmpval) {
@@ -1266,8 +1276,8 @@ retry_private:
 			break;
 		case -EFAULT:
 			double_unlock_hb(hb1, hb2);
-			put_futex_key(fshared, &key2);
-			put_futex_key(fshared, &key1);
+			put_futex_key(&key2);
+			put_futex_key(&key1);
 			ret = fault_in_user_writeable(uaddr2);
 			if (!ret)
 				goto retry;
@@ -1275,8 +1285,8 @@ retry_private:
 		case -EAGAIN:
 			/* The owner was exiting, try again. */
 			double_unlock_hb(hb1, hb2);
-			put_futex_key(fshared, &key2);
-			put_futex_key(fshared, &key1);
+			put_futex_key(&key2);
+			put_futex_key(&key1);
 			cond_resched();
 			goto retry;
 		default:
@@ -1358,9 +1368,9 @@ out_unlock:
 		drop_futex_key_refs(&key1);
 
 out_put_keys:
-	put_futex_key(fshared, &key2);
+	put_futex_key(&key2);
 out_put_key1:
-	put_futex_key(fshared, &key1);
+	put_futex_key(&key1);
 out:
 	if (pi_state != NULL)
 		free_pi_state(pi_state);
@@ -1500,7 +1510,7 @@ static void unqueue_me_pi(struct futex_q *q)
  * private futexes.
  */
 static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
-				struct task_struct *newowner, int fshared)
+				struct task_struct *newowner)
 {
 	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
 	struct futex_pi_state *pi_state = q->pi_state;
@@ -1593,20 +1603,11 @@ handle_fault:
 	goto retry;
 }
 
-/*
- * In case we must use restart_block to restart a futex_wait,
- * we encode in the 'flags' shared capability
- */
-#define FLAGS_SHARED		0x01
-#define FLAGS_CLOCKRT		0x02
-#define FLAGS_HAS_TIMEOUT	0x04
-
 static long futex_wait_restart(struct restart_block *restart);
 
 /**
  * fixup_owner() - Post lock pi_state and corner case management
  * @uaddr:	user address of the futex
- * @fshared:	whether the futex is shared (1) or not (0)
  * @q:		futex_q (contains pi_state and access to the rt_mutex)
  * @locked:	if the attempt to take the rt_mutex succeeded (1) or not (0)
  *
@@ -1619,8 +1620,7 @@ static long futex_wait_restart(struct restart_block *restart);
  *  0 - success, lock not taken
  * <0 - on error (-EFAULT)
  */
-static int fixup_owner(u32 __user *uaddr, int fshared, struct futex_q *q,
-		       int locked)
+static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
 {
 	struct task_struct *owner;
 	int ret = 0;
@@ -1631,7 +1631,7 @@ static int fixup_owner(u32 __user *uaddr, int fshared, struct futex_q *q,
 		 * did a lock-steal - fix up the PI-state in that case:
 		 */
 		if (q->pi_state->owner != current)
-			ret = fixup_pi_state_owner(uaddr, q, current, fshared);
+			ret = fixup_pi_state_owner(uaddr, q, current);
 		goto out;
 	}
 
@@ -1658,7 +1658,7 @@ static int fixup_owner(u32 __user *uaddr, int fshared, struct futex_q *q,
 		 * lock. Fix the state up.
 		 */
 		owner = rt_mutex_owner(&q->pi_state->pi_mutex);
-		ret = fixup_pi_state_owner(uaddr, q, owner, fshared);
+		ret = fixup_pi_state_owner(uaddr, q, owner);
 		goto out;
 	}
 
@@ -1721,7 +1721,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
  * futex_wait_setup() - Prepare to wait on a futex
  * @uaddr:	the futex userspace address
  * @val:	the expected value
- * @fshared:	whether the futex is shared (1) or not (0)
+ * @flags:	futex flags (FLAGS_SHARED, etc.)
  * @q:		the associated futex_q
  * @hb:		storage for hash_bucket pointer to be returned to caller
  *
@@ -1734,7 +1734,7 @@ static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
  *  0 - uaddr contains val and hb has been locked
  * <1 - -EFAULT or -EWOULDBLOCK (uaddr does not contain val) and hb is unlcoked
  */
-static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
+static int futex_wait_setup(u32 __user *uaddr, u32 val, int flags,
 			   struct futex_q *q, struct futex_hash_bucket **hb)
 {
 	u32 uval;
@@ -1759,7 +1759,7 @@ static int futex_wait_setup(u32 __user *uaddr, u32 val, int fshared,
 	 */
 retry:
 	q->key = FUTEX_KEY_INIT;
-	ret = get_futex_key(uaddr, fshared, &q->key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q->key);
 	if (unlikely(ret != 0))
 		return ret;
 
@@ -1775,10 +1775,10 @@ retry_private:
 		if (ret)
 			goto out;
 
-		if (!fshared)
+		if (!(flags & FLAGS_SHARED))
 			goto retry_private;
 
-		put_futex_key(fshared, &q->key);
+		put_futex_key(&q->key);
 		goto retry;
 	}
 
@@ -1789,12 +1789,12 @@ retry_private:
 
 out:
 	if (ret)
-		put_futex_key(fshared, &q->key);
+		put_futex_key(&q->key);
 	return ret;
 }
 
-static int futex_wait(u32 __user *uaddr, int fshared,
-		      u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
+static int futex_wait(u32 __user *uaddr, int flags, u32 val, ktime_t *abs_time,
+		      u32 bitset)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct restart_block *restart;
@@ -1813,8 +1813,9 @@ static int futex_wait(u32 __user *uaddr, int fshared,
 	if (abs_time) {
 		to = &timeout;
 
-		hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME :
-				      CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+		hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ?
+				      CLOCK_REALTIME : CLOCK_MONOTONIC,
+				      HRTIMER_MODE_ABS);
 		hrtimer_init_sleeper(to, current);
 		hrtimer_set_expires_range_ns(&to->timer, *abs_time,
 					     current->timer_slack_ns);
@@ -1822,7 +1823,7 @@ static int futex_wait(u32 __user *uaddr, int fshared,
 
 retry:
 	/* Prepare to wait on uaddr. */
-	ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+	ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
 	if (ret)
 		goto out;
 
@@ -1842,7 +1843,7 @@ retry:
 	 * victim of a spurious wakeup as well.
 	 */
 	if (!signal_pending(current)) {
-		put_futex_key(fshared, &q.key);
+		put_futex_key(&q.key);
 		goto retry;
 	}
 
@@ -1856,17 +1857,12 @@ retry:
 	restart->futex.val = val;
 	restart->futex.time = abs_time->tv64;
 	restart->futex.bitset = bitset;
-	restart->futex.flags = FLAGS_HAS_TIMEOUT;
-
-	if (fshared)
-		restart->futex.flags |= FLAGS_SHARED;
-	if (clockrt)
-		restart->futex.flags |= FLAGS_CLOCKRT;
+	restart->futex.flags = flags;
 
 	ret = -ERESTART_RESTARTBLOCK;
 
 out_put_key:
-	put_futex_key(fshared, &q.key);
+	put_futex_key(&q.key);
 out:
 	if (to) {
 		hrtimer_cancel(&to->timer);
@@ -1879,7 +1875,6 @@ out:
 static long futex_wait_restart(struct restart_block *restart)
 {
 	u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
-	int fshared = 0;
 	ktime_t t, *tp = NULL;
 
 	if (restart->futex.flags & FLAGS_HAS_TIMEOUT) {
@@ -1887,11 +1882,9 @@ static long futex_wait_restart(struct restart_block *restart)
 		tp = &t;
 	}
 	restart->fn = do_no_restart_syscall;
-	if (restart->futex.flags & FLAGS_SHARED)
-		fshared = 1;
-	return (long)futex_wait(uaddr, fshared, restart->futex.val, tp,
-				restart->futex.bitset,
-				restart->futex.flags & FLAGS_CLOCKRT);
+
+	return (long)futex_wait(uaddr, restart->futex.flags,
+				restart->futex.val, tp, restart->futex.bitset);
 }
 
 
@@ -1901,8 +1894,8 @@ static long futex_wait_restart(struct restart_block *restart)
  * if there are waiters then it will block, it does PI, etc. (Due to
  * races the kernel might see a 0 value of the futex too.)
  */
-static int futex_lock_pi(u32 __user *uaddr, int fshared,
-			 int detect, ktime_t *time, int trylock)
+static int futex_lock_pi(u32 __user *uaddr, int flags, int detect,
+			 ktime_t *time, int trylock)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct futex_hash_bucket *hb;
@@ -1925,7 +1918,7 @@ static int futex_lock_pi(u32 __user *uaddr, int fshared,
 	q.requeue_pi_key = NULL;
 retry:
 	q.key = FUTEX_KEY_INIT;
-	ret = get_futex_key(uaddr, fshared, &q.key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -1947,7 +1940,7 @@ retry_private:
 			 * exit to complete.
 			 */
 			queue_unlock(&q, hb);
-			put_futex_key(fshared, &q.key);
+			put_futex_key(&q.key);
 			cond_resched();
 			goto retry;
 		default:
@@ -1977,7 +1970,7 @@ retry_private:
 	 * Fixup the pi_state owner and possibly acquire the lock if we
 	 * haven't already.
 	 */
-	res = fixup_owner(uaddr, fshared, &q, !ret);
+	res = fixup_owner(uaddr, &q, !ret);
 	/*
 	 * If fixup_owner() returned an error, proprogate that.  If it acquired
 	 * the lock, clear our -ETIMEDOUT or -EINTR.
@@ -2001,7 +1994,7 @@ out_unlock_put_key:
 	queue_unlock(&q, hb);
 
 out_put_key:
-	put_futex_key(fshared, &q.key);
+	put_futex_key(&q.key);
 out:
 	if (to)
 		destroy_hrtimer_on_stack(&to->timer);
@@ -2014,10 +2007,10 @@ uaddr_faulted:
 	if (ret)
 		goto out_put_key;
 
-	if (!fshared)
+	if (!(flags & FLAGS_SHARED))
 		goto retry_private;
 
-	put_futex_key(fshared, &q.key);
+	put_futex_key(&q.key);
 	goto retry;
 }
 
@@ -2026,7 +2019,7 @@ uaddr_faulted:
  * This is the in-kernel slowpath: we look up the PI state (if any),
  * and do the rt-mutex unlock.
  */
-static int futex_unlock_pi(u32 __user *uaddr, int fshared)
+static int futex_unlock_pi(u32 __user *uaddr, int flags)
 {
 	struct futex_hash_bucket *hb;
 	struct futex_q *this, *next;
@@ -2044,7 +2037,7 @@ retry:
 	if ((uval & FUTEX_TID_MASK) != task_pid_vnr(current))
 		return -EPERM;
 
-	ret = get_futex_key(uaddr, fshared, &key);
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &key);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -2099,14 +2092,14 @@ retry:
 
 out_unlock:
 	spin_unlock(&hb->lock);
-	put_futex_key(fshared, &key);
+	put_futex_key(&key);
 
 out:
 	return ret;
 
 pi_faulted:
 	spin_unlock(&hb->lock);
-	put_futex_key(fshared, &key);
+	put_futex_key(&key);
 
 	ret = fault_in_user_writeable(uaddr);
 	if (!ret)
@@ -2166,7 +2159,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
 /**
  * futex_wait_requeue_pi() - Wait on uaddr and take uaddr2
  * @uaddr:	the futex we initially wait on (non-pi)
- * @fshared:	whether the futexes are shared (1) or not (0).  They must be
+ * @flags:	futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, etc.), they must be
  * 		the same type, no requeueing from private to shared, etc.
  * @val:	the expected value of uaddr
  * @abs_time:	absolute timeout
@@ -2204,9 +2197,9 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
  *  0 - On success
  * <0 - On error
  */
-static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
+static int futex_wait_requeue_pi(u32 __user *uaddr, int flags,
 				 u32 val, ktime_t *abs_time, u32 bitset,
-				 int clockrt, u32 __user *uaddr2)
+				 u32 __user *uaddr2)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct rt_mutex_waiter rt_waiter;
@@ -2221,8 +2214,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 
 	if (abs_time) {
 		to = &timeout;
-		hrtimer_init_on_stack(&to->timer, clockrt ? CLOCK_REALTIME :
-				      CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+		hrtimer_init_on_stack(&to->timer, (flags & FLAGS_CLOCKRT) ?
+				      CLOCK_REALTIME : CLOCK_MONOTONIC,
+				      HRTIMER_MODE_ABS);
 		hrtimer_init_sleeper(to, current);
 		hrtimer_set_expires_range_ns(&to->timer, *abs_time,
 					     current->timer_slack_ns);
@@ -2236,7 +2230,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	rt_waiter.task = NULL;
 
 	key2 = FUTEX_KEY_INIT;
-	ret = get_futex_key(uaddr2, fshared, &key2);
+	ret = get_futex_key(uaddr2, flags & FLAGS_SHARED, &key2);
 	if (unlikely(ret != 0))
 		goto out;
 
@@ -2246,7 +2240,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	q.requeue_pi_key = &key2;
 
 	/* Prepare to wait on uaddr. */
-	ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
+	ret = futex_wait_setup(uaddr, val, flags, &q, &hb);
 	if (ret)
 		goto out_key2;
 
@@ -2274,8 +2268,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 		 */
 		if (q.pi_state && (q.pi_state->owner != current)) {
 			spin_lock(q.lock_ptr);
-			ret = fixup_pi_state_owner(uaddr2, &q, current,
-						   fshared);
+			ret = fixup_pi_state_owner(uaddr2, &q, current);
 			spin_unlock(q.lock_ptr);
 		}
 	} else {
@@ -2294,7 +2287,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 		 * Fixup the pi_state owner and possibly acquire the lock if we
 		 * haven't already.
 		 */
-		res = fixup_owner(uaddr2, fshared, &q, !ret);
+		res = fixup_owner(uaddr2, &q, !ret);
 		/*
 		 * If fixup_owner() returned an error, proprogate that.  If it
 		 * acquired the lock, clear -ETIMEDOUT or -EINTR.
@@ -2325,9 +2318,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
 	}
 
 out_put_keys:
-	put_futex_key(fshared, &q.key);
+	put_futex_key(&q.key);
 out_key2:
-	put_futex_key(fshared, &key2);
+	put_futex_key(&key2);
 
 out:
 	if (to) {
@@ -2551,58 +2544,58 @@ void exit_robust_list(struct task_struct *curr)
 long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		u32 __user *uaddr2, u32 val2, u32 val3)
 {
-	int clockrt, ret = -ENOSYS;
+	int ret = -ENOSYS;
 	int cmd = op & FUTEX_CMD_MASK;
-	int fshared = 0;
+	int flags = 0;
 
 	if (!(op & FUTEX_PRIVATE_FLAG))
-		fshared = 1;
+		flags |= FLAGS_SHARED;
 
-	clockrt = op & FUTEX_CLOCK_REALTIME;
-	if (clockrt && cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
-		return -ENOSYS;
+	if (op & FUTEX_CLOCK_REALTIME) {
+		flags |= FLAGS_CLOCKRT;
+		if (cmd != FUTEX_WAIT_BITSET && cmd != FUTEX_WAIT_REQUEUE_PI)
+			return -ENOSYS;
+	}
 
 	switch (cmd) {
 	case FUTEX_WAIT:
 		val3 = FUTEX_BITSET_MATCH_ANY;
 	case FUTEX_WAIT_BITSET:
-		ret = futex_wait(uaddr, fshared, val, timeout, val3, clockrt);
+		ret = futex_wait(uaddr, flags, val, timeout, val3);
 		break;
 	case FUTEX_WAKE:
 		val3 = FUTEX_BITSET_MATCH_ANY;
 	case FUTEX_WAKE_BITSET:
-		ret = futex_wake(uaddr, fshared, val, val3);
+		ret = futex_wake(uaddr, flags, val, val3);
 		break;
 	case FUTEX_REQUEUE:
-		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, NULL, 0);
+		ret = futex_requeue(uaddr, flags, uaddr2, val, val2, NULL, 0);
 		break;
 	case FUTEX_CMP_REQUEUE:
-		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
-				    0);
+		ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 0);
 		break;
 	case FUTEX_WAKE_OP:
-		ret = futex_wake_op(uaddr, fshared, uaddr2, val, val2, val3);
+		ret = futex_wake_op(uaddr, flags, uaddr2, val, val2, val3);
 		break;
 	case FUTEX_LOCK_PI:
 		if (futex_cmpxchg_enabled)
-			ret = futex_lock_pi(uaddr, fshared, val, timeout, 0);
+			ret = futex_lock_pi(uaddr, flags, val, timeout, 0);
 		break;
 	case FUTEX_UNLOCK_PI:
 		if (futex_cmpxchg_enabled)
-			ret = futex_unlock_pi(uaddr, fshared);
+			ret = futex_unlock_pi(uaddr, flags);
 		break;
 	case FUTEX_TRYLOCK_PI:
 		if (futex_cmpxchg_enabled)
-			ret = futex_lock_pi(uaddr, fshared, 0, timeout, 1);
+			ret = futex_lock_pi(uaddr, flags, 0, timeout, 1);
 		break;
 	case FUTEX_WAIT_REQUEUE_PI:
 		val3 = FUTEX_BITSET_MATCH_ANY;
-		ret = futex_wait_requeue_pi(uaddr, fshared, val, timeout, val3,
-					    clockrt, uaddr2);
+		ret = futex_wait_requeue_pi(uaddr, flags, val, timeout, val3,
+					    uaddr2);
 		break;
 	case FUTEX_CMP_REQUEUE_PI:
-		ret = futex_requeue(uaddr, fshared, uaddr2, val, val2, &val3,
-				    1);
+		ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
 		break;
 	default:
 		ret = -ENOSYS;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 2/6] futex: add futex_q static initializer
  2010-04-05 20:23 [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
  2010-04-05 20:23 ` [PATCH 1/6] futex: replace fshared and clockrt with combined flags Darren Hart
@ 2010-04-05 20:23 ` Darren Hart
  2010-04-05 20:23 ` [PATCH 3/6] futex: refactor futex_lock_pi_atomic Darren Hart
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-05 20:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Darren Hart

The futex_q struct has grown considerably over the last year or so. I
believe it now merits a static initializer to avoid uninitialized data
errors (having just spent more time than I care to admit debugging
an uninitialized q.bitset in an experimental new op code).

I originally planned on following the __THING_INITIALIZER/DECLARE_THING
method, but since we already had FUTEX_KEY_INIT, and I personally prefer
that method, I went that route.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 kernel/futex.c |   19 +++++++++----------
 1 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index ed08cfd..2ae18cd 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -130,6 +130,12 @@ struct futex_q {
 	u32 bitset;
 };
 
+#define FUTEX_Q_INIT \
+	{ /* list gets initialized in queue_me()*/ \
+	  .task = NULL, NULL, FUTEX_KEY_INIT \
+	, NULL, NULL, NULL, FUTEX_BITSET_MATCH_ANY }
+
+
 /*
  * Hash buckets are shared by all the futex_keys that hash to the same
  * location.  Each key may have multiple futex_q structures, one for each task
@@ -1799,16 +1805,13 @@ static int futex_wait(u32 __user *uaddr, int flags, u32 val, ktime_t *abs_time,
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct restart_block *restart;
 	struct futex_hash_bucket *hb;
-	struct futex_q q;
+	struct futex_q q = FUTEX_Q_INIT;
 	int ret;
 
 	if (!bitset)
 		return -EINVAL;
 
-	q.pi_state = NULL;
 	q.bitset = bitset;
-	q.rt_waiter = NULL;
-	q.requeue_pi_key = NULL;
 
 	if (abs_time) {
 		to = &timeout;
@@ -1899,7 +1902,7 @@ static int futex_lock_pi(u32 __user *uaddr, int flags, int detect,
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
 	struct futex_hash_bucket *hb;
-	struct futex_q q;
+	struct futex_q q = FUTEX_Q_INIT;
 	int res, ret;
 
 	if (refill_pi_state_cache())
@@ -1913,9 +1916,6 @@ static int futex_lock_pi(u32 __user *uaddr, int flags, int detect,
 		hrtimer_set_expires(&to->timer, *time);
 	}
 
-	q.pi_state = NULL;
-	q.rt_waiter = NULL;
-	q.requeue_pi_key = NULL;
 retry:
 	q.key = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
@@ -2206,7 +2206,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int flags,
 	struct rt_mutex *pi_mutex = NULL;
 	struct futex_hash_bucket *hb;
 	union futex_key key2;
-	struct futex_q q;
+	struct futex_q q = FUTEX_Q_INIT;
 	int res, ret;
 
 	if (!bitset)
@@ -2234,7 +2234,6 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int flags,
 	if (unlikely(ret != 0))
 		goto out;
 
-	q.pi_state = NULL;
 	q.bitset = bitset;
 	q.rt_waiter = &rt_waiter;
 	q.requeue_pi_key = &key2;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 3/6] futex: refactor futex_lock_pi_atomic
  2010-04-05 20:23 [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
  2010-04-05 20:23 ` [PATCH 1/6] futex: replace fshared and clockrt with combined flags Darren Hart
  2010-04-05 20:23 ` [PATCH 2/6] futex: add futex_q static initializer Darren Hart
@ 2010-04-05 20:23 ` Darren Hart
  2010-04-05 20:23 ` [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning Darren Hart
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-05 20:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Darren Hart

Prepare for FUTEX_LOCK by refactoring futex_lock_pi_atomic() into
lock_futex_atomic() and lock_pi_futex_atomic(). The name change is meant to
reflect the naming convention in futex.c that futex_*() functions map directly
to futex op codes and the others are internal helper functions.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>

===================================================================
---
 kernel/futex.c |   79 +++++++++++++++++++++++++++++++++++++------------------
 1 files changed, 53 insertions(+), 26 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 2ae18cd..8c1bb16 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -625,30 +625,23 @@ lookup_pi_state(u32 uval, struct futex_hash_bucket *hb,
 }
 
 /**
- * futex_lock_pi_atomic() - Atomic work required to acquire a pi aware futex
- * @uaddr:		the pi futex user address
- * @hb:			the pi futex hash bucket
- * @key:		the futex key associated with uaddr and hb
- * @ps:			the pi_state pointer where we store the result of the
- *			lookup
- * @task:		the task to perform the atomic lock work for.  This will
- *			be "current" except in the case of requeue pi.
- * @set_waiters:	force setting the FUTEX_WAITERS bit (1) or not (0)
+ * lock_futex_atomic() - Try to acquire the futex lock atomically
+ * @uaddr:       user address of the futex
+ * @task:        the task to perform the atomic lock work for.
+ * @set_waiters: force setting the FUTEX_WAITERS bit (1) or not (0)
+ *
+ * The hb->lock shall be held by the caller.
  *
  * Returns:
  *  0 - ready to wait
  *  1 - acquired the lock
  * <0 - error
- *
- * The hb->lock and futex_key refs shall be held by the caller.
  */
-static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
-				union futex_key *key,
-				struct futex_pi_state **ps,
-				struct task_struct *task, int set_waiters)
+static int lock_futex_atomic(u32 __user *uaddr, struct task_struct *task,
+                             int set_waiters)
 {
-	int lock_taken, ret, ownerdied = 0;
 	u32 uval, newval, curval;
+	int lock_taken, ret;
 
 retry:
 	ret = lock_taken = 0;
@@ -695,10 +688,10 @@ retry:
 	 *
 	 * This is safe as we are protected by the hash bucket lock !
 	 */
-	if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) {
+	if (unlikely((curval & FUTEX_OWNER_DIED) ||
+	             !(curval & FUTEX_TID_MASK))) {
 		/* Keep the OWNER_DIED bit */
 		newval = (curval & ~FUTEX_TID_MASK) | task_pid_vnr(task);
-		ownerdied = 0;
 		lock_taken = 1;
 	}
 
@@ -715,10 +708,46 @@ retry:
 	if (unlikely(lock_taken))
 		return 1;
 
+	return ret;
+}
+
+/**
+ * lock_pi_futex_atomic() - Atomic work required to acquire a pi aware futex
+ * @uaddr:		the pi futex user address
+ * @hb:			the pi futex hash bucket
+ * @key:		the futex key associated with uaddr and hb
+ * @ps:			the pi_state pointer where we store the result of the
+ *			lookup
+ * @task:		the task to perform the atomic lock work for.  This will
+ *			be "current" except in the case of requeue pi.
+ * @set_waiters:	force setting the FUTEX_WAITERS bit (1) or not (0)
+ *
+ * Returns:
+ *  0 - ready to wait
+ *  1 - acquired the lock
+ * <0 - error
+ *
+ * The hb->lock and futex_key refs shall be held by the caller.
+ */
+static int lock_pi_futex_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
+				union futex_key *key,
+				struct futex_pi_state **ps,
+				struct task_struct *task, int set_waiters)
+{
+	u32 uval;
+	int ret;
+
+retry:
+	ret = lock_futex_atomic(uaddr, task, set_waiters);
+	if (ret)
+		return ret;
+
 	/*
 	 * We dont have the lock. Look up the PI state (or create it if
 	 * we are the first waiter):
 	 */
+	if (get_futex_value_locked(&uval, uaddr))
+		return -EFAULT;
 	ret = lookup_pi_state(uval, hb, key, ps);
 
 	if (unlikely(ret)) {
@@ -729,7 +758,7 @@ retry:
 			 * OWNER_DIED bit is set to figure out whether
 			 * this is a robust futex or not.
 			 */
-			if (get_futex_value_locked(&curval, uaddr))
+			if (get_futex_value_locked(&uval, uaddr))
 				return -EFAULT;
 
 			/*
@@ -737,10 +766,8 @@ retry:
 			 * futex. The code above will take the futex
 			 * and return happy.
 			 */
-			if (curval & FUTEX_OWNER_DIED) {
-				ownerdied = 1;
+			if (uval & FUTEX_OWNER_DIED)
 				goto retry;
-			}
 		default:
 			break;
 		}
@@ -1100,7 +1127,7 @@ void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key,
  *
  * Try and get the lock on behalf of the top waiter if we can do it atomically.
  * Wake the top waiter if we succeed.  If the caller specified set_waiters,
- * then direct futex_lock_pi_atomic() to force setting the FUTEX_WAITERS bit.
+ * then direct lock_pi_futex_atomic() to force setting the FUTEX_WAITERS bit.
  * hb1 and hb2 must be held by the caller.
  *
  * Returns:
@@ -1124,7 +1151,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
 	/*
 	 * Find the top_waiter and determine if there are additional waiters.
 	 * If the caller intends to requeue more than 1 waiter to pifutex,
-	 * force futex_lock_pi_atomic() to set the FUTEX_WAITERS bit now,
+	 * force lock_pi_futex_atomic() to set the FUTEX_WAITERS bit now,
 	 * as we have means to handle the possible fault.  If not, don't set
 	 * the bit unecessarily as it will force the subsequent unlock to enter
 	 * the kernel.
@@ -1144,7 +1171,7 @@ static int futex_proxy_trylock_atomic(u32 __user *pifutex,
 	 * the contended case or if set_waiters is 1.  The pi_state is returned
 	 * in ps in contended cases.
 	 */
-	ret = futex_lock_pi_atomic(pifutex, hb2, key2, ps, top_waiter->task,
+	ret = lock_pi_futex_atomic(pifutex, hb2, key2, ps, top_waiter->task,
 				   set_waiters);
 	if (ret == 1)
 		requeue_pi_wake_futex(top_waiter, key2, hb2);
@@ -1925,7 +1952,7 @@ retry:
 retry_private:
 	hb = queue_lock(&q);
 
-	ret = futex_lock_pi_atomic(uaddr, hb, &q.key, &q.pi_state, current, 0);
+	ret = lock_pi_futex_atomic(uaddr, hb, &q.key, &q.pi_state, current, 0);
 	if (unlikely(ret)) {
 		switch (ret) {
 		case 1:
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-04-05 20:23 [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
                   ` (2 preceding siblings ...)
  2010-04-05 20:23 ` [PATCH 3/6] futex: refactor futex_lock_pi_atomic Darren Hart
@ 2010-04-05 20:23 ` Darren Hart
  2010-04-06 16:55   ` Thomas Gleixner
  2010-04-05 20:23 ` [PATCH 5/6] futex: handle timeout inside adaptive lock spin Darren Hart
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-05 20:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Darren Hart, Peter Zijlstra

Add a non-pi TID value based futex locking mechanism. This enables the
use of adaptive spinning which was problematic with the basic FUTEX_WAIT
operation.

V4: Remove update_futex_waiters() and perform all waiter clearing in
    userspace. Turns out to be generally better to do an O(1) operation in
    userspace that forces one extra syscall than to do an O(N) operation during
    every syscall. Go figure :).
    Cleanup timeout when exiting adaptive loop.
    Perform adaptive spinning after retry:
    General cleanups.
V3: Fix some locking bugs
    Use cmpxchg directly in the adaptive loop
    Move the adaptive loop into its own function
    Use FUTEX_WAIT instead of FUTEX_UNLOCK
    Add update_futex_waiters()
V2: Fix preempt-count breakage

This patch is a forward port of the following patch from Peter Zijlstra
with some bug fixes and a reworking of the futex op codes.

> futex: Implement kernel-side adaptive spinning for futexes
>
> This is implemented as a new futex op: FUTEX_WAIT_ADAPTIVE, because it
> requires the futex lock field to contain a TID and regular futexes do
> not have that restriction.
>
> FUTEX_WAIT_ADAPTIVE will spin while the lock owner is running (on a
> different cpu) or until the task gets preempted. After that it behaves
> like FUTEX_WAIT.
>
> The spin loop tries to acquire the lock by cmpxchg(lock, 0, tid) == tid
> on the lower 30 bits (TID_MASK) of the lock field -- leaving the
> WAITERS and OWNER_DIED flags in tact.
>
> NOTE: we could fold mutex_spin_on_owner() and futex_spin_on_owner() by
> adding a lock_owner function argument.
>
>   void lock_spin_on_owner(struct thread_info *(*lock_owner)(void *lock),
>                           void *lock, struct thread_info *owner);
>
> Peter Zijlstra's original patch forward ported by Darren Hart.
>

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 include/linux/futex.h |    6 +
 include/linux/sched.h |    1 +
 kernel/futex.c        |  282 +++++++++++++++++++++++++++++++++++++++++++------
 kernel/sched.c        |   59 ++++++++++
 4 files changed, 314 insertions(+), 34 deletions(-)

diff --git a/include/linux/futex.h b/include/linux/futex.h
index 1e5a26d..3ca93d1 100644
--- a/include/linux/futex.h
+++ b/include/linux/futex.h
@@ -20,6 +20,8 @@
 #define FUTEX_WAKE_BITSET	10
 #define FUTEX_WAIT_REQUEUE_PI	11
 #define FUTEX_CMP_REQUEUE_PI	12
+#define FUTEX_LOCK		13
+#define FUTEX_LOCK_ADAPTIVE	14
 
 #define FUTEX_PRIVATE_FLAG	128
 #define FUTEX_CLOCK_REALTIME	256
@@ -39,6 +41,9 @@
 					 FUTEX_PRIVATE_FLAG)
 #define FUTEX_CMP_REQUEUE_PI_PRIVATE	(FUTEX_CMP_REQUEUE_PI | \
 					 FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_PRIVATE		(FUTEX_LOCK | FUTEX_PRIVATE_FLAG)
+#define FUTEX_LOCK_ADAPTIVE_PRIVATE	(FUTEX_LOCK_ADAPTIVE | \
+					 FUTEX_PRIVATE_FLAG)
 
 /*
  * Support for robust futexes: the kernel cleans up held futexes at
@@ -177,6 +182,7 @@ union futex_key {
 extern void exit_robust_list(struct task_struct *curr);
 extern void exit_pi_state_list(struct task_struct *curr);
 extern int futex_cmpxchg_enabled;
+extern struct thread_info *futex_owner(u32 __user *uaddr);
 #else
 static inline void exit_robust_list(struct task_struct *curr)
 {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index dad7f66..885d659 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -362,6 +362,7 @@ extern signed long schedule_timeout_killable(signed long timeout);
 extern signed long schedule_timeout_uninterruptible(signed long timeout);
 asmlinkage void schedule(void);
 extern int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner);
+extern int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner);
 
 struct nsproxy;
 struct user_namespace;
diff --git a/kernel/futex.c b/kernel/futex.c
index 8c1bb16..c33ac2a 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -75,6 +75,7 @@ int __read_mostly futex_cmpxchg_enabled;
 #define FLAGS_SHARED		0x01
 #define FLAGS_CLOCKRT		0x02
 #define FLAGS_HAS_TIMEOUT	0x04
+#define FLAGS_ADAPTIVE		0x08
 
 /*
  * Priority Inheritance state:
@@ -368,6 +369,35 @@ static int get_futex_value_locked(u32 *dest, u32 __user *from)
 	return ret ? -EFAULT : 0;
 }
 
+#ifdef CONFIG_SMP
+struct thread_info *futex_owner(u32 __user *uaddr)
+{
+	struct thread_info *ti = NULL;
+	struct task_struct *p;
+	pid_t pid;
+	u32 uval;
+
+	if (get_futex_value_locked(&uval, uaddr))
+		return NULL;
+
+	pid = uval & FUTEX_TID_MASK;
+	rcu_read_lock();
+	p = find_task_by_vpid(pid);
+	if (p) {
+		const struct cred *cred, *pcred;
+
+		cred = current_cred();
+		pcred = __task_cred(p);
+		if (cred->euid == pcred->euid ||
+		    cred->euid == pcred->uid)
+			ti = task_thread_info(p);
+	}
+	rcu_read_unlock();
+
+	return ti;
+}
+#endif
+
 
 /*
  * PI code:
@@ -717,7 +747,7 @@ retry:
  * @hb:			the pi futex hash bucket
  * @key:		the futex key associated with uaddr and hb
  * @ps:			the pi_state pointer where we store the result of the
- *			lookup
+ *			lookup. If non-NULL, hb and key must also be non-NULL.
  * @task:		the task to perform the atomic lock work for.  This will
  *			be "current" except in the case of requeue pi.
  * @set_waiters:	force setting the FUTEX_WAITERS bit (1) or not (0)
@@ -727,7 +757,8 @@ retry:
  *  1 - acquired the lock
  * <0 - error
  *
- * The hb->lock and futex_key refs shall be held by the caller.
+ * The hb->lock and futex_key refs shall be held by the caller if the pi_state
+ * pointer is non-NULL.
  */
 static int lock_pi_futex_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb,
 				union futex_key *key,
@@ -741,37 +772,32 @@ retry:
 	ret = lock_futex_atomic(uaddr, task, set_waiters);
 	if (ret)
 		return ret;
-
-	/*
-	 * We dont have the lock. Look up the PI state (or create it if
-	 * we are the first waiter):
+ 	/*
+	 * We don't have the lock.
+	 * Look up the PI state (or create it if we are the first waiter).
+	 * futex_lock_atomic() calls us with ps==NULL for non-pi atomic futex
+	 * acquisition.
 	 */
-	if (get_futex_value_locked(&uval, uaddr))
-		return -EFAULT;
-	ret = lookup_pi_state(uval, hb, key, ps);
+	if (ps)
+		ret = lookup_pi_state(uval, hb, key, ps);
 
-	if (unlikely(ret)) {
-		switch (ret) {
-		case -ESRCH:
-			/*
-			 * No owner found for this futex. Check if the
-			 * OWNER_DIED bit is set to figure out whether
-			 * this is a robust futex or not.
-			 */
-			if (get_futex_value_locked(&uval, uaddr))
-				return -EFAULT;
+	if (!ps || unlikely(ret) == -ESRCH) {
+		/*
+		 * No owner found for this futex. Check if the
+		 * OWNER_DIED bit is set to figure out whether
+		 * this is a robust futex or not.
+		 */
+		if (get_futex_value_locked(&uval, uaddr))
+			return -EFAULT;
 
-			/*
-			 * We simply start over in case of a robust
-			 * futex. The code above will take the futex
-			 * and return happy.
-			 */
-			if (uval & FUTEX_OWNER_DIED)
-				goto retry;
-		default:
-			break;
-		}
-	}
+		/*
+		 * We simply start over in case of a robust
+		 * futex. The code above will take the futex
+		 * and return happy.
+		 */
+		if (uval & FUTEX_OWNER_DIED)
+			goto retry;
+  	}
 
 	return ret;
 }
@@ -874,13 +900,13 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this)
 	return 0;
 }
 
-static int unlock_futex_pi(u32 __user *uaddr, u32 uval)
+static int unlock_futex(u32 __user *uaddr, u32 uval)
 {
 	u32 oldval;
 
 	/*
 	 * There is no waiter, so we unlock the futex. The owner died
-	 * bit has not to be preserved here. We are the owner:
+	 * bit does not need to be preserved here. We are the owner:
 	 */
 	oldval = cmpxchg_futex_value_locked(uaddr, uval, 0);
 
@@ -2112,7 +2138,7 @@ retry:
 	 * No waiters - kernel unlocks the futex:
 	 */
 	if (!(uval & FUTEX_OWNER_DIED)) {
-		ret = unlock_futex_pi(uaddr, uval);
+		ret = unlock_futex(uaddr, uval);
 		if (ret == -EFAULT)
 			goto pi_faulted;
 	}
@@ -2356,6 +2382,187 @@ out:
 	return ret;
 }
 
+/**
+ * trylock_futex_adaptive() - Try to acquire the futex lock in a busy loop
+ * @uaddr: the futex user address
+ *
+ * Try to acquire a futex lock in a loop until the owner changes or the owner
+ * is descheduled. To lock the futex, set the value to the current TID.
+ *
+ * Returns:
+ *  0 - Gave up, lock not acquired
+ *  1 - Futex lock acquired
+ * <0 - On error
+ */
+static int trylock_futex_adaptive(u32 __user *uaddr)
+{
+	int ret = 0;
+	u32 curval;
+
+	for (;;) {
+		struct thread_info *owner;
+
+		owner = futex_owner(uaddr);
+		if (owner && futex_spin_on_owner(uaddr, owner))
+			break;
+
+		/*
+		 * Try to acquire the lock. If we acquire it or error out,
+		 * break to enable preemption and handle ret outside the loop.
+		 */
+		curval = cmpxchg_futex_value_locked(uaddr, 0,
+		                                    task_pid_vnr(current));
+
+		if (curval == 0) {
+			ret = 1;
+			break;
+		}
+
+		if (unlikely(curval == -EFAULT)) {
+			ret = -EFAULT;
+			break;
+		}
+
+		/*
+		 * Futex locks manage the owner atomically. We are not in
+		 * danger of deadlock by preempting a pending owner. Abort the
+		 * loop if our time slice has expired.  RT Threads can spin
+		 * until they preempt the owner, at which point they will break
+		 * out of the loop anyway.
+		 */
+		if (need_resched())
+			break;
+
+		cpu_relax();
+	}
+	return ret;
+}
+
+/**
+ * futex_lock() - Acquire the lock and set the futex value
+ * @uaddr:  the futex user address
+ * @flags:  futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, FLAGS_ADAPTIVE, etc.)
+ * @detect: detect deadlock (1) or not (0)
+ * @time:   absolute timeout
+ *
+ * futex_(un)lock() define a futex value policy and implement a full mutex. The
+ * futex value stores the owner's TID or'd with FUTEX_WAITERS and/or
+ * FUTEX_OWNER_DIED as appropriate.
+ *
+ * Userspace tried a 0 -> TID atomic transition of the futex value and failed.
+ * Try to acquire the lock in the kernel, blocking if necessary. Return to
+ * userspace with the lock held and the futex value set accordingly (or after a
+ * timeout).
+ *
+ * Returns:
+ *  0 - On success
+ * <0 - On error
+ */
+static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time)
+{
+	struct hrtimer_sleeper timeout, *to = NULL;
+	struct futex_hash_bucket *hb;
+	struct futex_q q = FUTEX_Q_INIT;
+	int ret = 0;
+
+	if (refill_pi_state_cache())
+		return -ENOMEM;
+
+	if (time) {
+		to = &timeout;
+		hrtimer_init_on_stack(&to->timer, CLOCK_REALTIME,
+				      HRTIMER_MODE_ABS);
+		hrtimer_init_sleeper(to, current);
+		hrtimer_set_expires(&to->timer, *time);
+	}
+
+retry:
+#ifdef CONFIG_SMP
+	if (flags & FLAGS_ADAPTIVE) {
+		preempt_disable();
+		ret = trylock_futex_adaptive(uaddr);
+		preempt_enable();
+
+		/* We got the lock. */
+		if (ret == 1) {
+			ret = 0;
+			goto out;
+		}
+
+		/* We encountered an error, -EFAULT most likely. */
+		if (ret)
+			goto out;
+	}
+#endif
+	q.key = FUTEX_KEY_INIT;
+	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
+	if (unlikely(ret != 0))
+		goto out;
+
+	hb = queue_lock(&q);
+
+	ret = lock_futex_atomic(uaddr, current, 0);
+	if (unlikely(ret)) {
+		switch (ret) {
+		case 1:
+			/* We got the lock. */
+			ret = 0;
+			goto out_unlock_put_key;
+		case -EFAULT:
+			goto uaddr_faulted;
+		default:
+			goto out_unlock_put_key;
+		}
+	}
+	/*
+	 * Only actually queue now that the atomic ops are done:
+	 */
+	futex_wait_queue_me(hb, &q, to);
+
+	ret = unqueue_me(&q);
+
+	/* If we were woken by the owner, try and acquire the lock. */
+	if (!ret)
+		goto retry;
+
+	ret = -ETIMEDOUT;
+	if (to && !to->task)
+		goto out_put_key;
+
+	/*
+	 * We expect signal_pending(current), but we might be the
+	 * victim of a spurious wakeup as well.
+         */
+	ret = -ERESTARTNOINTR;
+	if (!signal_pending(current)) {
+		put_futex_key(&q.key);
+		goto retry;
+	}
+
+	goto out_put_key;
+
+out_unlock_put_key:
+	queue_unlock(&q, hb);
+
+out_put_key:
+	put_futex_key(&q.key);
+out:
+	if (to)
+		destroy_hrtimer_on_stack(&to->timer);
+	return ret != -EINTR ? ret : -ERESTARTNOINTR;
+
+uaddr_faulted:
+	queue_unlock(&q, hb);
+
+	ret = fault_in_user_writeable(uaddr);
+	if (ret)
+		goto out_put_key;
+
+	put_futex_key(&q.key);
+	goto retry;
+}
+
+
 /*
  * Support for robust futexes: the kernel cleans up held futexes at
  * thread exit time.
@@ -2623,6 +2830,12 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 	case FUTEX_CMP_REQUEUE_PI:
 		ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
 		break;
+	case FUTEX_LOCK_ADAPTIVE:
+		flags |= FLAGS_ADAPTIVE;
+	case FUTEX_LOCK:
+		if (futex_cmpxchg_enabled)
+			ret = futex_lock(uaddr, flags, val, timeout);
+		break;
 	default:
 		ret = -ENOSYS;
 	}
@@ -2641,7 +2854,8 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
 
 	if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
 		      cmd == FUTEX_WAIT_BITSET ||
-		      cmd == FUTEX_WAIT_REQUEUE_PI)) {
+		      cmd == FUTEX_WAIT_REQUEUE_PI ||
+		      cmd == FUTEX_LOCK || cmd == FUTEX_LOCK_ADAPTIVE)) {
 		if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
 			return -EFAULT;
 		if (!timespec_valid(&ts))
diff --git a/kernel/sched.c b/kernel/sched.c
index 9ab3cd7..a2dbb5b 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3818,6 +3818,65 @@ int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
 out:
 	return 1;
 }
+
+#ifdef CONFIG_FUTEX
+#include <linux/futex.h>
+
+int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner)
+{
+	unsigned int cpu;
+	struct rq *rq;
+
+	if (!sched_feat(OWNER_SPIN))
+		return 0;
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+	/*
+	 * Need to access the cpu field knowing that
+	 * DEBUG_PAGEALLOC could have unmapped it if
+	 * the mutex owner just released it and exited.
+	 */
+	if (probe_kernel_address(&owner->cpu, cpu))
+		goto out;
+#else
+	cpu = owner->cpu;
+#endif
+
+	/*
+	 * Even if the access succeeded (likely case),
+	 * the cpu field may no longer be valid.
+	 */
+	if (cpu >= nr_cpumask_bits)
+		goto out;
+
+	/*
+	 * We need to validate that we can do a
+	 * get_cpu() and that we have the percpu area.
+	 */
+	if (!cpu_online(cpu))
+		goto out;
+
+	rq = cpu_rq(cpu);
+
+	for (;;) {
+		/*
+		 * Owner changed, break to re-assess state.
+		 */
+		if (futex_owner(uaddr) != owner)
+			break;
+
+		/*
+		 * Is that owner really running on that cpu?
+		 */
+		if (task_thread_info(rq->curr) != owner || need_resched())
+			return 0;
+
+		cpu_relax();
+	}
+out:
+	return 1;
+}
+#endif
 #endif
 
 #ifdef CONFIG_PREEMPT
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 5/6] futex: handle timeout inside adaptive lock spin
  2010-04-05 20:23 [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
                   ` (3 preceding siblings ...)
  2010-04-05 20:23 ` [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning Darren Hart
@ 2010-04-05 20:23 ` Darren Hart
  2010-04-06  8:27   ` Thomas Gleixner
  2010-04-05 20:23 ` [PATCH 6/6] futex: Add aggressive adaptive spinning argument to FUTEX_LOCK Darren Hart
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-05 20:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Darren Hart

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 kernel/futex.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index c33ac2a..af61dcd 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2385,6 +2385,7 @@ out:
 /**
  * trylock_futex_adaptive() - Try to acquire the futex lock in a busy loop
  * @uaddr: the futex user address
+ * @timeout: absolute timeout or NULL if none
  *
  * Try to acquire a futex lock in a loop until the owner changes or the owner
  * is descheduled. To lock the futex, set the value to the current TID.
@@ -2394,10 +2395,11 @@ out:
  *  1 - Futex lock acquired
  * <0 - On error
  */
-static int trylock_futex_adaptive(u32 __user *uaddr)
+static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
 {
 	int ret = 0;
 	u32 curval;
+	ktime_t now;
 
 	for (;;) {
 		struct thread_info *owner;
@@ -2433,6 +2435,22 @@ static int trylock_futex_adaptive(u32 __user *uaddr)
 		if (need_resched())
 			break;
 
+		if (timeout) {
+			now = ktime_get();
+/* FIXME: consider creating ktime_less_than(lhs, rhs) */
+#if (BITS_PER_LONG == 64) || defined(CONFIG_KTIME_SCALAR)
+			if (timeout->tv64 < now.tv64)
+				break;
+#else
+			if (timeout->sec < now.sec ||
+			    (timeout->sec == now.sec &&
+			     timeout->nsec < now.nsec)) {
+				ret = -ETIMEDOUT;
+				break;
+			}
+#endif
+		}
+
 		cpu_relax();
 	}
 	return ret;
@@ -2480,7 +2498,7 @@ retry:
 #ifdef CONFIG_SMP
 	if (flags & FLAGS_ADAPTIVE) {
 		preempt_disable();
-		ret = trylock_futex_adaptive(uaddr);
+		ret = trylock_futex_adaptive(uaddr, time);
 		preempt_enable();
 
 		/* We got the lock. */
@@ -2489,7 +2507,7 @@ retry:
 			goto out;
 		}
 
-		/* We encountered an error, -EFAULT most likely. */
+		/* We encountered an error, -EFAULT or -ETIMEDOUT */
 		if (ret)
 			goto out;
 	}
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* [PATCH 6/6] futex: Add aggressive adaptive spinning argument to FUTEX_LOCK
  2010-04-05 20:23 [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
                   ` (4 preceding siblings ...)
  2010-04-05 20:23 ` [PATCH 5/6] futex: handle timeout inside adaptive lock spin Darren Hart
@ 2010-04-05 20:23 ` Darren Hart
  2010-04-08  5:58   ` Darren Hart
  2010-04-05 20:48 ` [PATCH V2^W V4 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-05 20:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Darren Hart

Aggresive adaptive spinning (aas) allows for more than spinner in the
adaptive loop at a given time. In some scenarios this yields better
results than the default single spinner mode.

Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
---
 kernel/futex.c |   21 ++++++++++++++++++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index af61dcd..ddbd158 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2462,6 +2462,9 @@ static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
  * @flags:  futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, FLAGS_ADAPTIVE, etc.)
  * @detect: detect deadlock (1) or not (0)
  * @time:   absolute timeout
+ * @aas:    aggressive adaptive spinning (1) allow multiple spinners (0) allow
+ *          only one spinner. Only valid in conjunction with the FLAGS_ADAPTIVE
+ *          flag.
  *
  * futex_(un)lock() define a futex value policy and implement a full mutex. The
  * futex value stores the owner's TID or'd with FUTEX_WAITERS and/or
@@ -2476,12 +2479,14 @@ static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
  *  0 - On success
  * <0 - On error
  */
-static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time)
+static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time,
+                      int aas)
 {
 	struct hrtimer_sleeper timeout, *to = NULL;
-	struct futex_hash_bucket *hb;
 	struct futex_q q = FUTEX_Q_INIT;
+	struct futex_hash_bucket *hb;
 	int ret = 0;
+	u32 uval;
 
 	if (refill_pi_state_cache())
 		return -ENOMEM;
@@ -2497,6 +2502,14 @@ static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time)
 retry:
 #ifdef CONFIG_SMP
 	if (flags & FLAGS_ADAPTIVE) {
+		if (!aas) {
+			ret = get_user(uval, uaddr);
+			if (ret)
+				goto out;
+			if (uval & FUTEX_WAITERS)
+				goto skip_adaptive;
+		}
+
 		preempt_disable();
 		ret = trylock_futex_adaptive(uaddr, time);
 		preempt_enable();
@@ -2512,6 +2525,8 @@ retry:
 			goto out;
 	}
 #endif
+
+skip_adaptive:
 	q.key = FUTEX_KEY_INIT;
 	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
 	if (unlikely(ret != 0))
@@ -2852,7 +2867,7 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
 		flags |= FLAGS_ADAPTIVE;
 	case FUTEX_LOCK:
 		if (futex_cmpxchg_enabled)
-			ret = futex_lock(uaddr, flags, val, timeout);
+			ret = futex_lock(uaddr, flags, val, timeout, val3);
 		break;
 	default:
 		ret = -ENOSYS;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [PATCH V2^W V4 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 20:23 [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
                   ` (5 preceding siblings ...)
  2010-04-05 20:23 ` [PATCH 6/6] futex: Add aggressive adaptive spinning argument to FUTEX_LOCK Darren Hart
@ 2010-04-05 20:48 ` Darren Hart
  2010-04-05 21:15 ` [PATCH V2 " Avi Kivity
  2010-04-06 15:29 ` Peter Zijlstra
  8 siblings, 0 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-05 20:48 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity

And by V2 I meant V4....

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 20:23 [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
                   ` (6 preceding siblings ...)
  2010-04-05 20:48 ` [PATCH V2^W V4 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
@ 2010-04-05 21:15 ` Avi Kivity
  2010-04-05 21:54   ` Darren Hart
  2010-04-06  8:48   ` Peter Zijlstra
  2010-04-06 15:29 ` Peter Zijlstra
  8 siblings, 2 replies; 62+ messages in thread
From: Avi Kivity @ 2010-04-05 21:15 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On 04/05/2010 11:23 PM, Darren Hart wrote:
> In-Reply-To:
>
> NOT FOR INCLUSION
>
> The following patch series implements a new experimental kernel side futex mutex
> via new FUTEX_LOCK and FUTEX_LOCK_ADAPTIVE futex op codes. The adaptive spin
> follows the kernel mutex model of allowing one spinner until the lock is
> released or the owner is descheduled. The patch currently allows the user to
> specify if they want no spinning, a single adaptive spinner, or multiple
> spinners (aggressive adaptive spinning, or aas... which I have mistyped as "ass"
> enough times to realize a better term is indeed required :-).
>    

An interesting (but perhaps difficult to achieve) optimization would be 
to spin in userspace.

> I'm using the futex_lock branch of my futextest test suite to gather results.
> The test is performance/futex_lock.c and can be checked out here:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/dvhart/futextest.git
>
> Per Avi's suggestion I added asm instruction based period and duty-cycle options
> to do some testing. A series of plots with 256 threads, one per period length,
> is shown at the URL below. Each plot compares raw futex lock (normal, no
> spinning) with a single spinner (adaptive) and with multiple spinners (aas) for
> each of several duty-cycles to determine performance at various levels of
> contention. The test measure the number of lock/unlock pairs performed per
> second (in thousands).
>
> http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v4/
>
> Avi's suggested starting point was 1000 instructions with a 10% duty cycle.
> That plot "Futex Lock/Unlock (Period=1000, Threads=256)" does appear to be the
> most interesting of the lot. It's not so short that other overhead becomes the
> bottleneck, and not so long so as to make adaptive spinning benefits show up
> as noise in the numbers. The adaptive spin (with a single waiter) consistently
> beats the normal run, and outperforms aas for most duty-cycles. I rechecked a
> few points on this plot to confirm and the relative scores remained consistent.
> These plots were generated using 10,000,000 iterations per datapoint.
>
> Lastly, I should mention that these results all underperform a simple
> pthread_mutex_lock()/pthread_mutex_unlock() pair. I'm looking into why but felt
> I was overdue in sharing what I have to date. A test comparing this to a
> sched_yield() style userspace spinlock would probably be more appropraite.
>
>    

How many cores (or hardware threads) does this machine have?  At 10% 
duty cycle you have 25 waiters behind the lock on average.  I don't 
think this is realistic, and it means that spinning is invoked only rarely.

I'd be interested in seeing runs where the average number of waiters is 
0.2, 0.5, 1, and 2, corresponding to moderate-to-bad contention.  25 
average waiters on compute bound code means the application needs to be 
rewritten, no amount of mutex tweaking will help it.

Does the wakeup code select the spinning waiter, or just a random waiter?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 21:15 ` [PATCH V2 " Avi Kivity
@ 2010-04-05 21:54   ` Darren Hart
  2010-04-05 22:21     ` Avi Kivity
  2010-04-06  8:48   ` Peter Zijlstra
  1 sibling, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-05 21:54 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

Avi Kivity wrote:
> On 04/05/2010 11:23 PM, Darren Hart wrote:
>> In-Reply-To:
>>
>> NOT FOR INCLUSION
>>
>> The following patch series implements a new experimental kernel side 
>> futex mutex
>> via new FUTEX_LOCK and FUTEX_LOCK_ADAPTIVE futex op codes. The 
>> adaptive spin
>> follows the kernel mutex model of allowing one spinner until the lock is
>> released or the owner is descheduled. The patch currently allows the 
>> user to
>> specify if they want no spinning, a single adaptive spinner, or multiple
>> spinners (aggressive adaptive spinning, or aas... which I have 
>> mistyped as "ass"
>> enough times to realize a better term is indeed required :-).
>>    
> 
> An interesting (but perhaps difficult to achieve) optimization would be 
> to spin in userspace.

I couldn't think of a lightweight way to determine when the owner has 
been scheduled out in userspace. Kernel assistance is required. You 
could do this on the schedule() side of things, but I figured I'd get 
some strong pushback if I tried to add a hook into descheduling that 
flipped a bit in the futex value stating the owner was about to 
deschedule(). Still, that might be something to explore.

> 
> How many cores (or hardware threads) does this machine have?  

Sorry, I meant to include that. I tested on an 8 CPU (no hardware 
threads) 2.6 GHz Opteron 2218 (2 QuadCore CPUs) system.

 > At 10%
> duty cycle you have 25 waiters behind the lock on average.  I don't 
> think this is realistic, and it means that spinning is invoked only rarely.

Perhaps some instrumentation is in order, it seems to get invoked enough 
to achieve some 20% increase in lock/unlock iterations. Perhaps another 
metric would be of more value - such as average wait time?

> I'd be interested in seeing runs where the average number of waiters is 
> 0.2, 0.5, 1, and 2, corresponding to moderate-to-bad contention.
> 25 average waiters on compute bound code means the application needs to be 
> rewritten, no amount of mutex tweaking will help it.

Perhaps something NR_CPUS threads would be of more interest? At 10% 
that's about .8 and at 25% the 2 of your upper limit. I could add a few 
more duty-cycle points and make 25% the max. I'll kick that off and post 
the results... probably tomorrow, 10M iterations takes a while, but 
makes the results relatively stable.

> Does the wakeup code select the spinning waiter, or just a random waiter?

The wakeup code selects the highest priority task in fifo order to 
wake-up - however, under contention it is most likely going to go back 
to sleep as another waiter will steal the lock out from under it. This 
locking strategy is unashamedly about as "unfair" as it gets.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 21:54   ` Darren Hart
@ 2010-04-05 22:21     ` Avi Kivity
  2010-04-05 22:59       ` Darren Hart
  2010-04-05 23:15       ` Darren Hart
  0 siblings, 2 replies; 62+ messages in thread
From: Avi Kivity @ 2010-04-05 22:21 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On 04/06/2010 12:54 AM, Darren Hart wrote:
> Avi Kivity wrote:
>> On 04/05/2010 11:23 PM, Darren Hart wrote:
>>> In-Reply-To:
>>>
>>> NOT FOR INCLUSION
>>>
>>> The following patch series implements a new experimental kernel side 
>>> futex mutex
>>> via new FUTEX_LOCK and FUTEX_LOCK_ADAPTIVE futex op codes. The 
>>> adaptive spin
>>> follows the kernel mutex model of allowing one spinner until the 
>>> lock is
>>> released or the owner is descheduled. The patch currently allows the 
>>> user to
>>> specify if they want no spinning, a single adaptive spinner, or 
>>> multiple
>>> spinners (aggressive adaptive spinning, or aas... which I have 
>>> mistyped as "ass"
>>> enough times to realize a better term is indeed required :-).
>>
>> An interesting (but perhaps difficult to achieve) optimization would 
>> be to spin in userspace.
>
> I couldn't think of a lightweight way to determine when the owner has 
> been scheduled out in userspace. Kernel assistance is required. You 
> could do this on the schedule() side of things, but I figured I'd get 
> some strong pushback if I tried to add a hook into descheduling that 
> flipped a bit in the futex value stating the owner was about to 
> deschedule(). Still, that might be something to explore.

In the futex value it's hopeless (since a thread can hold many locks), 
but I don't think it's unreasonable to set a bit in the thread local 
storage area.  The futex format would then need to be extended to 
contain a pointer to this bit.


>
>>
>> How many cores (or hardware threads) does this machine have? 
>
> Sorry, I meant to include that. I tested on an 8 CPU (no hardware 
> threads) 2.6 GHz Opteron 2218 (2 QuadCore CPUs) system.
>
> > At 10%
>> duty cycle you have 25 waiters behind the lock on average.  I don't 
>> think this is realistic, and it means that spinning is invoked only 
>> rarely.
>
> Perhaps some instrumentation is in order, it seems to get invoked 
> enough to achieve some 20% increase in lock/unlock iterations. Perhaps 
> another metric would be of more value - such as average wait time?

Why measure an unrealistic workload?

>
>> I'd be interested in seeing runs where the average number of waiters 
>> is 0.2, 0.5, 1, and 2, corresponding to moderate-to-bad contention.
>> 25 average waiters on compute bound code means the application needs 
>> to be rewritten, no amount of mutex tweaking will help it.
>
> Perhaps something NR_CPUS threads would be of more interest? 

That seems artificial.

> At 10% that's about .8 and at 25% the 2 of your upper limit. I could 
> add a few more duty-cycle points and make 25% the max. I'll kick that 
> off and post the results... probably tomorrow, 10M iterations takes a 
> while, but makes the results relatively stable.

Thanks.  But why not vary the number of threads as well?

>
>> Does the wakeup code select the spinning waiter, or just a random 
>> waiter?
>
> The wakeup code selects the highest priority task in fifo order to 
> wake-up - however, under contention it is most likely going to go back 
> to sleep as another waiter will steal the lock out from under it. This 
> locking strategy is unashamedly about as "unfair" as it gets.

Best to avoid the wakeup if we notice the lock was stolen.


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 22:21     ` Avi Kivity
@ 2010-04-05 22:59       ` Darren Hart
  2010-04-06 13:28         ` Avi Kivity
  2010-04-06 21:22         ` Darren Hart
  2010-04-05 23:15       ` Darren Hart
  1 sibling, 2 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-05 22:59 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

Avi Kivity wrote:

>> > At 10%
>>> duty cycle you have 25 waiters behind the lock on average.  I don't 
>>> think this is realistic, and it means that spinning is invoked only 
>>> rarely.
>>
>> Perhaps some instrumentation is in order, it seems to get invoked 
>> enough to achieve some 20% increase in lock/unlock iterations. Perhaps 
>> another metric would be of more value - such as average wait time?
> 
> Why measure an unrealistic workload?

No argument there, thus my proposal for an alternate configuration below.

>>> I'd be interested in seeing runs where the average number of waiters 
>>> is 0.2, 0.5, 1, and 2, corresponding to moderate-to-bad contention.
>>> 25 average waiters on compute bound code means the application needs 
>>> to be rewritten, no amount of mutex tweaking will help it.
>>
>> Perhaps something NR_CPUS threads would be of more interest? 
> 
> That seems artificial.

How so? Several real world applications use one thread per CPU to 
dispatch work to, wait for events, etc.

> 
>> At 10% that's about .8 and at 25% the 2 of your upper limit. I could 
>> add a few more duty-cycle points and make 25% the max. I'll kick that 
>> off and post the results... probably tomorrow, 10M iterations takes a 
>> while, but makes the results relatively stable.
> 
> Thanks.  But why not vary the number of threads as well?

Absolutely, I don't disagree that all the variables should vary in order 
to get a complete picture. I'm starting with 8 - it takes several hours 
to collect the data.

>>> Does the wakeup code select the spinning waiter, or just a random 
>>> waiter?
>>
>> The wakeup code selects the highest priority task in fifo order to 
>> wake-up - however, under contention it is most likely going to go back 
>> to sleep as another waiter will steal the lock out from under it. This 
>> locking strategy is unashamedly about as "unfair" as it gets.
> 
> Best to avoid the wakeup if we notice the lock was stolen.

You really can't do this precisely. You can read the futex value at 
various points along the wakeup path, but at some point you have to 
commit to waking a task, and you still have a race between the time you 
wake_up_task() and when it is scheduled and attempts the cmpxchg itself.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 22:21     ` Avi Kivity
  2010-04-05 22:59       ` Darren Hart
@ 2010-04-05 23:15       ` Darren Hart
  2010-04-05 23:29         ` Chris Wright
  2010-04-06 13:30         ` Avi Kivity
  1 sibling, 2 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-05 23:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

Avi Kivity wrote:

>>> An interesting (but perhaps difficult to achieve) optimization would 
>>> be to spin in userspace.
>>
>> I couldn't think of a lightweight way to determine when the owner has 
>> been scheduled out in userspace. Kernel assistance is required. You 
>> could do this on the schedule() side of things, but I figured I'd get 
>> some strong pushback if I tried to add a hook into descheduling that 
>> flipped a bit in the futex value stating the owner was about to 
>> deschedule(). Still, that might be something to explore.
> 
> In the futex value it's hopeless (since a thread can hold many locks), 

It can, but there is a futex value per lock. If the task_struct had a 
list of held futex locks (as it does for pi futex locks) the 
deschedule() path could walk that and mark the FUTEX_OWNER_SLEEPING bit.


> but I don't think it's unreasonable to set a bit in the thread local 
> storage area.  The futex format would then need to be extended to 
> contain a pointer to this bit.

This appears to be 1 bit per task instead of 1 bit per lock. Also, the 
value is thread-specific... so how would a potential waiter be able to 
determine if the owner of a particular lock was running or not with this 
method?  ... maybe I'm missing some core bit about TLS... are you 
talking about pthread_key_create() and pthread_getspecific() ?

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 23:15       ` Darren Hart
@ 2010-04-05 23:29         ` Chris Wright
  2010-04-06 13:30         ` Avi Kivity
  1 sibling, 0 replies; 62+ messages in thread
From: Chris Wright @ 2010-04-05 23:29 UTC (permalink / raw)
  To: Darren Hart
  Cc: Avi Kivity, linux-kernel, Thomas Gleixner, Peter Zijlstra,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

* Darren Hart (dvhltc@us.ibm.com) wrote:
> Avi Kivity wrote:
>
>>>> An interesting (but perhaps difficult to achieve) optimization 
>>>> would be to spin in userspace.
>>>
>>> I couldn't think of a lightweight way to determine when the owner has 
>>> been scheduled out in userspace. Kernel assistance is required. You  
>>> could do this on the schedule() side of things, but I figured I'd get 
>>> some strong pushback if I tried to add a hook into descheduling that  
>>> flipped a bit in the futex value stating the owner was about to  
>>> deschedule(). Still, that might be something to explore.
>>
>> In the futex value it's hopeless (since a thread can hold many locks), 
>
> It can, but there is a futex value per lock. If the task_struct had a  
> list of held futex locks (as it does for pi futex locks) the  
> deschedule() path could walk that and mark the FUTEX_OWNER_SLEEPING bit.

You also have a notification scheme (preempt_notifiers will fire on
schedule out).  However, you'd have to register the notifiers from a
non-current context (i.e. on slow path acquire reaching out to lock owner
and registering notifier on lock owner's behalf, which would kind of
defeat the point AFAICT).

thanks,
-chris

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] futex: handle timeout inside adaptive lock spin
  2010-04-05 20:23 ` [PATCH 5/6] futex: handle timeout inside adaptive lock spin Darren Hart
@ 2010-04-06  8:27   ` Thomas Gleixner
  2010-04-07 17:31     ` Darren Hart
  0 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2010-04-06  8:27 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity

On Mon, 5 Apr 2010, Darren Hart wrote:

> Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
> ---
>  kernel/futex.c |   24 +++++++++++++++++++++---
>  1 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index c33ac2a..af61dcd 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2385,6 +2385,7 @@ out:
>  /**
>   * trylock_futex_adaptive() - Try to acquire the futex lock in a busy loop
>   * @uaddr: the futex user address
> + * @timeout: absolute timeout or NULL if none
>   *
>   * Try to acquire a futex lock in a loop until the owner changes or the owner
>   * is descheduled. To lock the futex, set the value to the current TID.
> @@ -2394,10 +2395,11 @@ out:
>   *  1 - Futex lock acquired
>   * <0 - On error
>   */
> -static int trylock_futex_adaptive(u32 __user *uaddr)
> +static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
>  {
>  	int ret = 0;
>  	u32 curval;
> +	ktime_t now;
>  
>  	for (;;) {
>  		struct thread_info *owner;
> @@ -2433,6 +2435,22 @@ static int trylock_futex_adaptive(u32 __user *uaddr)
>  		if (need_resched())
>  			break;
>  
> +		if (timeout) {
> +			now = ktime_get();

  Hmm. Calling that in every iteration might hurt especially on non
  TSC systems, but well...

> +/* FIXME: consider creating ktime_less_than(lhs, rhs) */

  No need. The .tv64 comparison works in both cases. :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 21:15 ` [PATCH V2 " Avi Kivity
  2010-04-05 21:54   ` Darren Hart
@ 2010-04-06  8:48   ` Peter Zijlstra
  2010-04-06 14:47     ` Ulrich Drepper
  1 sibling, 1 reply; 62+ messages in thread
From: Peter Zijlstra @ 2010-04-06  8:48 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Darren Hart, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On Tue, 2010-04-06 at 00:15 +0300, Avi Kivity wrote:
> 
> An interesting (but perhaps difficult to achieve) optimization would be 
> to spin in userspace. 

System call overhead isn't that large, but if you want to amortize that
you can do a very small spin in userspace and then take the syscall:

  try
  spin
  try
  syscall

like.

Complicating anything beyond that is just not worth it.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 22:59       ` Darren Hart
@ 2010-04-06 13:28         ` Avi Kivity
  2010-04-06 13:35           ` Peter Zijlstra
  2010-04-06 21:22         ` Darren Hart
  1 sibling, 1 reply; 62+ messages in thread
From: Avi Kivity @ 2010-04-06 13:28 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On 04/06/2010 01:59 AM, Darren Hart wrote:
>
>
>>>> I'd be interested in seeing runs where the average number of 
>>>> waiters is 0.2, 0.5, 1, and 2, corresponding to moderate-to-bad 
>>>> contention.
>>>> 25 average waiters on compute bound code means the application 
>>>> needs to be rewritten, no amount of mutex tweaking will help it.
>>>
>>> Perhaps something NR_CPUS threads would be of more interest? 
>>
>> That seems artificial.
>
> How so? Several real world applications use one thread per CPU to 
> dispatch work to, wait for events, etc.

Yes, but that's the best case for spinning.  You could simply use a 
userspace spinlock in this case.


>
>>>> Does the wakeup code select the spinning waiter, or just a random 
>>>> waiter?
>>>
>>> The wakeup code selects the highest priority task in fifo order to 
>>> wake-up - however, under contention it is most likely going to go 
>>> back to sleep as another waiter will steal the lock out from under 
>>> it. This locking strategy is unashamedly about as "unfair" as it gets.
>>
>> Best to avoid the wakeup if we notice the lock was stolen.
>
> You really can't do this precisely. You can read the futex value at 
> various points along the wakeup path, but at some point you have to 
> commit to waking a task, and you still have a race between the time 
> you wake_up_task() and when it is scheduled and attempts the cmpxchg 
> itself.
>

The race is not a problem as it's just an optimization.  If you lose it 
you get a spurious wakeup, if you win you save some cpu cycles.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 23:15       ` Darren Hart
  2010-04-05 23:29         ` Chris Wright
@ 2010-04-06 13:30         ` Avi Kivity
  1 sibling, 0 replies; 62+ messages in thread
From: Avi Kivity @ 2010-04-06 13:30 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On 04/06/2010 02:15 AM, Darren Hart wrote:
> Avi Kivity wrote:
>
>>>> An interesting (but perhaps difficult to achieve) optimization 
>>>> would be to spin in userspace.
>>>
>>> I couldn't think of a lightweight way to determine when the owner 
>>> has been scheduled out in userspace. Kernel assistance is required. 
>>> You could do this on the schedule() side of things, but I figured 
>>> I'd get some strong pushback if I tried to add a hook into 
>>> descheduling that flipped a bit in the futex value stating the owner 
>>> was about to deschedule(). Still, that might be something to explore.
>>
>> In the futex value it's hopeless (since a thread can hold many locks), 
>
> It can, but there is a futex value per lock. If the task_struct had a 
> list of held futex locks (as it does for pi futex locks) the 
> deschedule() path could walk that and mark the FUTEX_OWNER_SLEEPING bit.
>

You don't want the context switch path to walk a list whose length is 
user controlled.

>> but I don't think it's unreasonable to set a bit in the thread local 
>> storage area.  The futex format would then need to be extended to 
>> contain a pointer to this bit.
>
> This appears to be 1 bit per task instead of 1 bit per lock. 

Yes.  O(1) on context switch instead of O(n).

> Also, the value is thread-specific... so how would a potential waiter 
> be able to determine if the owner of a particular lock was running or 
> not with this method?  ... maybe I'm missing some core bit about 
> TLS... are you talking about pthread_key_create() and 
> pthread_getspecific() ?

The lock would need to contain a pointer to the owning task.  Could be 
set with cmpxchg{8,16}b on x86.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 13:28         ` Avi Kivity
@ 2010-04-06 13:35           ` Peter Zijlstra
  2010-04-06 13:41             ` Avi Kivity
  2010-04-06 13:51             ` Alan Cox
  0 siblings, 2 replies; 62+ messages in thread
From: Peter Zijlstra @ 2010-04-06 13:35 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Darren Hart, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
> 
> Yes, but that's the best case for spinning.  You could simply use a 
> userspace spinlock in this case. 

Userspace spinlocks are evil.. they should _never_ be used.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 13:35           ` Peter Zijlstra
@ 2010-04-06 13:41             ` Avi Kivity
  2010-04-06 14:09               ` Peter Zijlstra
  2010-04-06 13:51             ` Alan Cox
  1 sibling, 1 reply; 62+ messages in thread
From: Avi Kivity @ 2010-04-06 13:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Darren Hart, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On 04/06/2010 04:35 PM, Peter Zijlstra wrote:
> On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
>    
>> Yes, but that's the best case for spinning.  You could simply use a
>> userspace spinlock in this case.
>>      
> Userspace spinlocks are evil.. they should _never_ be used.
>    

But in this case they're fastest.  If we don't provide a non-evil 
alternative, people will use them.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 13:35           ` Peter Zijlstra
  2010-04-06 13:41             ` Avi Kivity
@ 2010-04-06 13:51             ` Alan Cox
  2010-04-06 15:28               ` Darren Hart
  1 sibling, 1 reply; 62+ messages in thread
From: Alan Cox @ 2010-04-06 13:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Avi Kivity, Darren Hart, linux-kernel, Thomas Gleixner,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

On Tue, 06 Apr 2010 15:35:31 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
> > 
> > Yes, but that's the best case for spinning.  You could simply use a 
> > userspace spinlock in this case. 
> 
> Userspace spinlocks are evil.. they should _never_ be used.

Thats a gross and inaccurate simplification. For the case Avi is talking
about spinning in userspace makes sense in a lot of environments. Once
you've got one thread pinned per cpu (or gang scheduling >-) ) there are
various environments where it makes complete and utter sense.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 13:41             ` Avi Kivity
@ 2010-04-06 14:09               ` Peter Zijlstra
  2010-04-06 16:10                 ` Avi Kivity
  0 siblings, 1 reply; 62+ messages in thread
From: Peter Zijlstra @ 2010-04-06 14:09 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Darren Hart, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On Tue, 2010-04-06 at 16:41 +0300, Avi Kivity wrote:
> On 04/06/2010 04:35 PM, Peter Zijlstra wrote:
> > On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
> >    
> >> Yes, but that's the best case for spinning.  You could simply use a
> >> userspace spinlock in this case.
> >>      
> > Userspace spinlocks are evil.. they should _never_ be used.
> >    
> 
> But in this case they're fastest.  If we don't provide a non-evil 
> alternative, people will use them.
> 

That's what FUTEX_LOCK is about.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive  spinning
  2010-04-06  8:48   ` Peter Zijlstra
@ 2010-04-06 14:47     ` Ulrich Drepper
  2010-04-06 14:51       ` Peter Zijlstra
  0 siblings, 1 reply; 62+ messages in thread
From: Ulrich Drepper @ 2010-04-06 14:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Avi Kivity, Darren Hart, linux-kernel, Thomas Gleixner,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

On Tue, Apr 6, 2010 at 01:48, Peter Zijlstra <peterz@infradead.org> wrote:
>  try
>  spin
>  try
>  syscall

This is available for a long time in the mutex implementation
(PTHREAD_MUTEX_ADAPTIVE_NP mutex type).  It hasn't show much
improvement if any.  There were some people demanding this support for
as far as I know they are not using it now.  This is adaptive
spinning, learning from previous calls how long to wait.  But it's
still unguided.  There is no way to get information like "the owner
has been descheduled".

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive  spinning
  2010-04-06 14:47     ` Ulrich Drepper
@ 2010-04-06 14:51       ` Peter Zijlstra
  2010-04-06 15:33         ` Darren Hart
  0 siblings, 1 reply; 62+ messages in thread
From: Peter Zijlstra @ 2010-04-06 14:51 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Avi Kivity, Darren Hart, linux-kernel, Thomas Gleixner,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

On Tue, 2010-04-06 at 07:47 -0700, Ulrich Drepper wrote:
> On Tue, Apr 6, 2010 at 01:48, Peter Zijlstra <peterz@infradead.org>
> wrote:
> >  try
> >  spin
> >  try
> >  syscall
> 
> This is available for a long time in the mutex implementation
> (PTHREAD_MUTEX_ADAPTIVE_NP mutex type).  It hasn't show much
> improvement if any.  There were some people demanding this support for
> as far as I know they are not using it now.  This is adaptive
> spinning, learning from previous calls how long to wait.  But it's
> still unguided.  There is no way to get information like "the owner
> has been descheduled". 

That's where the FUTEX_LOCK thing comes in, it does all those, the above
was a single spin loop to amortize the syscall overhead.

I wouldn't make it any more complex than a single pause ins, syscalls
are terribly cheap these days.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 13:51             ` Alan Cox
@ 2010-04-06 15:28               ` Darren Hart
  2010-04-06 16:06                 ` Avi Kivity
  2010-04-06 16:44                 ` Alan Cox
  0 siblings, 2 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-06 15:28 UTC (permalink / raw)
  To: Alan Cox
  Cc: Peter Zijlstra, Avi Kivity, linux-kernel, Thomas Gleixner,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

Alan Cox wrote:
> On Tue, 06 Apr 2010 15:35:31 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
>> On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
>>> Yes, but that's the best case for spinning.  You could simply use a 
>>> userspace spinlock in this case. 
>> Userspace spinlocks are evil.. they should _never_ be used.
> 
> Thats a gross and inaccurate simplification. For the case Avi is talking
> about spinning in userspace makes sense in a lot of environments. Once
> you've got one thread pinned per cpu (or gang scheduling >-) ) there are
> various environments where it makes complete and utter sense.

Hi Alan,

Do you feel some of these situations would also benefit from some kernel 
assistance to stop spinning when the owner schedules out? Or are you 
saying that there are situations where pure userspace spinlocks will 
always be the best option?

If the latter, I'd think that they would also be situations where 
sched_yield() is not used as part of the spin loop. If so, then these 
are not our target situations for FUTEX_LOCK_ADAPTIVE, which hopes to 
provide a better informed mechanism for making spin or sleep decisions. 
If sleeping isn't part of the locking construct implementation, then 
FUTEX_LOCK_ADAPTIVE doesn't have much to offer.

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 20:23 [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
                   ` (7 preceding siblings ...)
  2010-04-05 21:15 ` [PATCH V2 " Avi Kivity
@ 2010-04-06 15:29 ` Peter Zijlstra
  8 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2010-04-06 15:29 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity

On Mon, 2010-04-05 at 13:23 -0700, Darren Hart wrote:
> Lastly, I should mention that these results all underperform a simple
> pthread_mutex_lock()/pthread_mutex_unlock() pair. I'm looking into why but felt
> I was overdue in sharing what I have to date. A test comparing this to a
> sched_yield() style userspace spinlock would probably be more appropraite.

This really should be able to out-perform a regular pthread_mutex_lock()
we saw a significant performance gain when we added adaptive spins to
the kernel mutex implementation, so I'd expect a gain on the futex one
as well.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 14:51       ` Peter Zijlstra
@ 2010-04-06 15:33         ` Darren Hart
  2010-04-06 15:37           ` Peter Zijlstra
  0 siblings, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-06 15:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ulrich Drepper, Avi Kivity, linux-kernel, Thomas Gleixner,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

Peter Zijlstra wrote:
> On Tue, 2010-04-06 at 07:47 -0700, Ulrich Drepper wrote:
>> On Tue, Apr 6, 2010 at 01:48, Peter Zijlstra <peterz@infradead.org>
>> wrote:
>>>  try
>>>  spin
>>>  try
>>>  syscall
>> This is available for a long time in the mutex implementation
>> (PTHREAD_MUTEX_ADAPTIVE_NP mutex type).  It hasn't show much
>> improvement if any.  There were some people demanding this support for
>> as far as I know they are not using it now.  This is adaptive
>> spinning, learning from previous calls how long to wait.  But it's
>> still unguided.  There is no way to get information like "the owner
>> has been descheduled". 
> 
> That's where the FUTEX_LOCK thing comes in, it does all those, the above
> was a single spin loop to amortize the syscall overhead.
> 
> I wouldn't make it any more complex than a single pause ins, syscalls
> are terribly cheap these days.

And yet they still seem to have a real impact on the futex_lock 
benchmark. Perhaps I am just still looking at pathological cases, but 
there is a strong correlation between high syscall counts and really low 
iterations per second. Granted this also correlates with lock 
contention. However, when using the same period and duty-cycle I find 
that a locking mechanism that makes significantly fewer syscalls also 
significantly outperforms one that makes more. Kind of handwavy stilly, 
I'll have more numbers this afternoon.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 15:33         ` Darren Hart
@ 2010-04-06 15:37           ` Peter Zijlstra
  0 siblings, 0 replies; 62+ messages in thread
From: Peter Zijlstra @ 2010-04-06 15:37 UTC (permalink / raw)
  To: Darren Hart
  Cc: Ulrich Drepper, Avi Kivity, linux-kernel, Thomas Gleixner,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

On Tue, 2010-04-06 at 08:33 -0700, Darren Hart wrote:
> Peter Zijlstra wrote:
> > On Tue, 2010-04-06 at 07:47 -0700, Ulrich Drepper wrote:
> >> On Tue, Apr 6, 2010 at 01:48, Peter Zijlstra <peterz@infradead.org>
> >> wrote:
> >>>  try
> >>>  spin
> >>>  try
> >>>  syscall
> >> This is available for a long time in the mutex implementation
> >> (PTHREAD_MUTEX_ADAPTIVE_NP mutex type).  It hasn't show much
> >> improvement if any.  There were some people demanding this support for
> >> as far as I know they are not using it now.  This is adaptive
> >> spinning, learning from previous calls how long to wait.  But it's
> >> still unguided.  There is no way to get information like "the owner
> >> has been descheduled". 
> > 
> > That's where the FUTEX_LOCK thing comes in, it does all those, the above
> > was a single spin loop to amortize the syscall overhead.
> > 
> > I wouldn't make it any more complex than a single pause ins, syscalls
> > are terribly cheap these days.
> 
> And yet they still seem to have a real impact on the futex_lock 
> benchmark. Perhaps I am just still looking at pathological cases, but 
> there is a strong correlation between high syscall counts and really low 
> iterations per second. Granted this also correlates with lock 
> contention. However, when using the same period and duty-cycle I find 
> that a locking mechanism that makes significantly fewer syscalls also 
> significantly outperforms one that makes more. Kind of handwavy stilly, 
> I'll have more numbers this afternoon.

Sure, but I'm still not sure why FUTEX_LOCK ends up making more syscalls
than FUTEX_WAIT based locking. Both should only do the syscall when the
lock is contended, both should only ever do 1 syscall per acquire,
right?




^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 15:28               ` Darren Hart
@ 2010-04-06 16:06                 ` Avi Kivity
  2010-04-06 16:14                   ` Thomas Gleixner
  2010-04-06 16:44                 ` Alan Cox
  1 sibling, 1 reply; 62+ messages in thread
From: Avi Kivity @ 2010-04-06 16:06 UTC (permalink / raw)
  To: Darren Hart
  Cc: Alan Cox, Peter Zijlstra, linux-kernel, Thomas Gleixner,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

On 04/06/2010 06:28 PM, Darren Hart wrote:
> Alan Cox wrote:
>> On Tue, 06 Apr 2010 15:35:31 +0200
>> Peter Zijlstra <peterz@infradead.org> wrote:
>>
>>> On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
>>>> Yes, but that's the best case for spinning.  You could simply use a 
>>>> userspace spinlock in this case. 
>>> Userspace spinlocks are evil.. they should _never_ be used.
>>
>> Thats a gross and inaccurate simplification. For the case Avi is talking
>> about spinning in userspace makes sense in a lot of environments. Once
>> you've got one thread pinned per cpu (or gang scheduling >-) ) there are
>> various environments where it makes complete and utter sense.
>
> Hi Alan,
>
> Do you feel some of these situations would also benefit from some 
> kernel assistance to stop spinning when the owner schedules out? Or 
> are you saying that there are situations where pure userspace 
> spinlocks will always be the best option?
>
> If the latter, I'd think that they would also be situations where 
> sched_yield() is not used as part of the spin loop. If so, then these 
> are not our target situations for FUTEX_LOCK_ADAPTIVE, which hopes to 
> provide a better informed mechanism for making spin or sleep 
> decisions. If sleeping isn't part of the locking construct 
> implementation, then FUTEX_LOCK_ADAPTIVE doesn't have much to offer.

IMO the best solution is to spin in userspace while the lock holder is 
running, fall into the kernel when it is scheduled out.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 14:09               ` Peter Zijlstra
@ 2010-04-06 16:10                 ` Avi Kivity
  2010-04-06 16:53                   ` Alan Cox
  0 siblings, 1 reply; 62+ messages in thread
From: Avi Kivity @ 2010-04-06 16:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Darren Hart, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On 04/06/2010 05:09 PM, Peter Zijlstra wrote:
> On Tue, 2010-04-06 at 16:41 +0300, Avi Kivity wrote:
>    
>> On 04/06/2010 04:35 PM, Peter Zijlstra wrote:
>>      
>>> On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
>>>
>>>        
>>>> Yes, but that's the best case for spinning.  You could simply use a
>>>> userspace spinlock in this case.
>>>>
>>>>          
>>> Userspace spinlocks are evil.. they should _never_ be used.
>>>
>>>        
>> But in this case they're fastest.  If we don't provide a non-evil
>> alternative, people will use them.
>>
>>      
> That's what FUTEX_LOCK is about.
>    

That works for the uncontended case.  For the contended case, the waiter 
and the owner have to go into the kernel and back out to transfer 
ownership.  In the non-adaptive case you have to switch to the idle task 
and back as well, and send an IPI.  That's a lot of latency if the 
unlock happened just after the waiter started the descent into the kernel.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 16:06                 ` Avi Kivity
@ 2010-04-06 16:14                   ` Thomas Gleixner
  2010-04-06 16:20                     ` Avi Kivity
  2010-04-06 16:54                     ` Alan Cox
  0 siblings, 2 replies; 62+ messages in thread
From: Thomas Gleixner @ 2010-04-06 16:14 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Darren Hart, Alan Cox, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On Tue, 6 Apr 2010, Avi Kivity wrote:

> On 04/06/2010 06:28 PM, Darren Hart wrote:
> > Alan Cox wrote:
> > > On Tue, 06 Apr 2010 15:35:31 +0200
> > > Peter Zijlstra <peterz@infradead.org> wrote:
> > > 
> > > > On Tue, 2010-04-06 at 16:28 +0300, Avi Kivity wrote:
> > > > > Yes, but that's the best case for spinning.  You could simply use a
> > > > > userspace spinlock in this case. 
> > > > Userspace spinlocks are evil.. they should _never_ be used.
> > > 
> > > Thats a gross and inaccurate simplification. For the case Avi is talking
> > > about spinning in userspace makes sense in a lot of environments. Once
> > > you've got one thread pinned per cpu (or gang scheduling >-) ) there are
> > > various environments where it makes complete and utter sense.
> > 
> > Hi Alan,
> > 
> > Do you feel some of these situations would also benefit from some kernel
> > assistance to stop spinning when the owner schedules out? Or are you saying
> > that there are situations where pure userspace spinlocks will always be the
> > best option?
> > 
> > If the latter, I'd think that they would also be situations where
> > sched_yield() is not used as part of the spin loop. If so, then these are
> > not our target situations for FUTEX_LOCK_ADAPTIVE, which hopes to provide a
> > better informed mechanism for making spin or sleep decisions. If sleeping
> > isn't part of the locking construct implementation, then FUTEX_LOCK_ADAPTIVE
> > doesn't have much to offer.
> 
> IMO the best solution is to spin in userspace while the lock holder is
> running, fall into the kernel when it is scheduled out.

That's just not realistic as user space has no idea whether the lock
holder is running or not and when it's scheduled out without a syscall :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 16:14                   ` Thomas Gleixner
@ 2010-04-06 16:20                     ` Avi Kivity
  2010-04-07  6:18                       ` john cooper
  2010-04-06 16:54                     ` Alan Cox
  1 sibling, 1 reply; 62+ messages in thread
From: Avi Kivity @ 2010-04-06 16:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Darren Hart, Alan Cox, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On 04/06/2010 07:14 PM, Thomas Gleixner wrote:
>
>> IMO the best solution is to spin in userspace while the lock holder is
>> running, fall into the kernel when it is scheduled out.
>>      
> That's just not realistic as user space has no idea whether the lock
> holder is running or not and when it's scheduled out without a syscall :)
>    

The kernel could easily expose this information by writing into the 
thread's TLS area.

So:

- the kernel maintains a current_cpu field in a thread's tls
- lock() atomically writes a pointer to the current thread's current_cpu 
when acquiring
- the kernel writes an invalid value to current_cpu when switching out
- a contended lock() retrieves the current_cpu pointer, and spins as 
long as it is a valid cpu

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 15:28               ` Darren Hart
  2010-04-06 16:06                 ` Avi Kivity
@ 2010-04-06 16:44                 ` Alan Cox
  2010-04-06 17:34                   ` Ulrich Drepper
  2010-04-06 19:31                   ` Thomas Gleixner
  1 sibling, 2 replies; 62+ messages in thread
From: Alan Cox @ 2010-04-06 16:44 UTC (permalink / raw)
  To: Darren Hart
  Cc: Peter Zijlstra, Avi Kivity, linux-kernel, Thomas Gleixner,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

> Do you feel some of these situations would also benefit from some kernel 
> assistance to stop spinning when the owner schedules out? Or are you 
> saying that there are situations where pure userspace spinlocks will 
> always be the best option?

There are cases its the best option - you are assuming for example that
the owner can get scheduled out. Eg nailing one thread per CPU in some
specialist high performance situations means they can't.

> If the latter, I'd think that they would also be situations where 
> sched_yield() is not used as part of the spin loop. If so, then these 
> are not our target situations for FUTEX_LOCK_ADAPTIVE, which hopes to 
> provide a better informed mechanism for making spin or sleep decisions. 
> If sleeping isn't part of the locking construct implementation, then 
> FUTEX_LOCK_ADAPTIVE doesn't have much to offer.

I am unsure about the approach. As Avi says knowing that the lock owner is
scheduled out allows for far better behaviour. It doesn't need complex
per lock stuff or per lock notifier entries on pre-empt either.

A given task is either pre-empted or not and in the normal case of things
you need this within a process so you've got shared pages anyway. So you
only need one instance of the 'is thread X pre-empted' bit somewhere in a
non swappable page.

That gives you something along the lines of

	runaddr = find_run_flag(lock);
	do {
		while(*runaddr == RUNNING) {
			if (trylock(lock))
				return WHOOPEE;
			cpu relax
		}
		yield (_on(thread));
	} while(*runaddr != DEAD);

which unlike blindly spinning can avoid the worst of any hit on the CPU
power and would be a bit more guided ?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 16:10                 ` Avi Kivity
@ 2010-04-06 16:53                   ` Alan Cox
  0 siblings, 0 replies; 62+ messages in thread
From: Alan Cox @ 2010-04-06 16:53 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Peter Zijlstra, Darren Hart, linux-kernel, Thomas Gleixner,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

> That works for the uncontended case.  For the contended case, the waiter 
> and the owner have to go into the kernel and back out to transfer 
> ownership.

If you insist on doing it that way yes, but knowing the lock owner is
likely to be away for a while also lets you do things like punt work
either by picking another work package in the meantime, or by queueing
the work you can't do on a list the pre-empted lock owner will review
before dropping the lock.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 16:14                   ` Thomas Gleixner
  2010-04-06 16:20                     ` Avi Kivity
@ 2010-04-06 16:54                     ` Alan Cox
  2010-04-06 18:15                       ` Thomas Gleixner
  1 sibling, 1 reply; 62+ messages in thread
From: Alan Cox @ 2010-04-06 16:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Avi Kivity, Darren Hart, Peter Zijlstra, linux-kernel,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

> > IMO the best solution is to spin in userspace while the lock holder is
> > running, fall into the kernel when it is scheduled out.
> 
> That's just not realistic as user space has no idea whether the lock
> holder is running or not and when it's scheduled out without a syscall :)

Which is the real problem that wants addressing and can be addressed very
cheaply. That would bring us up to par with 1970s RTOS environments ;)

Alan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-04-05 20:23 ` [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning Darren Hart
@ 2010-04-06 16:55   ` Thomas Gleixner
  2010-04-07 17:26     ` Darren Hart
  0 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2010-04-06 16:55 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Peter Zijlstra

On Mon, 5 Apr 2010, Darren Hart wrote:
> +#ifdef CONFIG_SMP
> +struct thread_info *futex_owner(u32 __user *uaddr)
> +{
> +	struct thread_info *ti = NULL;
> +	struct task_struct *p;
> +	pid_t pid;
> +	u32 uval;
> +
> +	if (get_futex_value_locked(&uval, uaddr))
> +		return NULL;
> +
> +	pid = uval & FUTEX_TID_MASK;
> +	rcu_read_lock();
> +	p = find_task_by_vpid(pid);
> +	if (p) {
> +		const struct cred *cred, *pcred;
> +
> +		cred = current_cred();
> +		pcred = __task_cred(p);
> +		if (cred->euid == pcred->euid ||
> +		    cred->euid == pcred->uid)
> +			ti = task_thread_info(p);

  We want a get_task_struct() here, don't we ?

> +	}
> +	rcu_read_unlock();
> +
> +	return ti;
> +}
> +#endif
> +

> +/**
> + * trylock_futex_adaptive() - Try to acquire the futex lock in a busy loop
> + * @uaddr: the futex user address
> + *
> + * Try to acquire a futex lock in a loop until the owner changes or the owner
> + * is descheduled. To lock the futex, set the value to the current TID.
> + *
> + * Returns:
> + *  0 - Gave up, lock not acquired
> + *  1 - Futex lock acquired
> + * <0 - On error
> + */
> +static int trylock_futex_adaptive(u32 __user *uaddr)
> +{
> +	int ret = 0;
> +	u32 curval;
> +
> +	for (;;) {
> +		struct thread_info *owner;
> +
> +		owner = futex_owner(uaddr);
> +		if (owner && futex_spin_on_owner(uaddr, owner))
> +			break;
> +
> +		/*
> +		 * Try to acquire the lock. If we acquire it or error out,
> +		 * break to enable preemption and handle ret outside the loop.
> +		 */
> +		curval = cmpxchg_futex_value_locked(uaddr, 0,
> +		                                    task_pid_vnr(current));
> +
> +		if (curval == 0) {
> +			ret = 1;
> +			break;
> +		}
> +
> +		if (unlikely(curval == -EFAULT)) {
> +			ret = -EFAULT;
> +			break;
> +		}
> +
> +		/*
> +		 * Futex locks manage the owner atomically. We are not in
> +		 * danger of deadlock by preempting a pending owner. Abort the
> +		 * loop if our time slice has expired.  RT Threads can spin
> +		 * until they preempt the owner, at which point they will break
> +		 * out of the loop anyway.
> +		 */
> +		if (need_resched())
> +			break;
> +
> +		cpu_relax();
> +	}
> +	return ret;
> +}


Hmm. The order is weird. Why don't you do that simpler ?

Get the uval, the tid and the thread_info pointer outside of the
loop. Also task_pid_vnr(current) just needs a one time lookup.

change the loop to do:

       for (;;) {
       	   curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
  
	   if (!curval)
	      return 1;
	   if ((curval & FUTEX_TID_MASK) != ownertid) {
	      ownertid = curval & FUTEX_TID_MASK;
	      owner = update_owner(ownertid);
	   }

	   if (owner && futex_spin_on_owner(....))
	   .....

       }

> +/**
> + * futex_lock() - Acquire the lock and set the futex value
> + * @uaddr:  the futex user address
> + * @flags:  futex flags (FLAGS_SHARED, FLAGS_CLOCKRT, FLAGS_ADAPTIVE, etc.)
> + * @detect: detect deadlock (1) or not (0)
> + * @time:   absolute timeout
> + *
> + * futex_(un)lock() define a futex value policy and implement a full mutex. The
> + * futex value stores the owner's TID or'd with FUTEX_WAITERS and/or
> + * FUTEX_OWNER_DIED as appropriate.
> + *
> + * Userspace tried a 0 -> TID atomic transition of the futex value and failed.
> + * Try to acquire the lock in the kernel, blocking if necessary. Return to
> + * userspace with the lock held and the futex value set accordingly (or after a
> + * timeout).
> + *
> + * Returns:
> + *  0 - On success
> + * <0 - On error
> + */
> +static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time)
> +{
> +	struct hrtimer_sleeper timeout, *to = NULL;
> +	struct futex_hash_bucket *hb;
> +	struct futex_q q = FUTEX_Q_INIT;
> +	int ret = 0;
> +
> +	if (refill_pi_state_cache())
> +		return -ENOMEM;
> +
> +	if (time) {
> +		to = &timeout;
> +		hrtimer_init_on_stack(&to->timer, CLOCK_REALTIME,
> +				      HRTIMER_MODE_ABS);

  Please make that like the WAIT_BITSET one, which can select the
  clock and defaults to MONOTONIC.

> +		hrtimer_init_sleeper(to, current);
> +		hrtimer_set_expires(&to->timer, *time);
> +	}

  Why setup all this _before_ trying the adaptive spin ?

> +retry:
> +#ifdef CONFIG_SMP
> +	if (flags & FLAGS_ADAPTIVE) {

  Why do we want that non adaptive version at all ?

> +		preempt_disable();
> +		ret = trylock_futex_adaptive(uaddr);
> +		preempt_enable();
> +
> +		/* We got the lock. */
> +		if (ret == 1) {
> +			ret = 0;
> +			goto out;
> +		}
> +
> +		/* We encountered an error, -EFAULT most likely. */
> +		if (ret)
> +			goto out;
> +	}
> +#endif
> +	q.key = FUTEX_KEY_INIT;
> +	ret = get_futex_key(uaddr, flags & FLAGS_SHARED, &q.key);
> +	if (unlikely(ret != 0))
> +		goto out;
> +
> +	hb = queue_lock(&q);
> +
> +	ret = lock_futex_atomic(uaddr, current, 0);
> +	if (unlikely(ret)) {
> +		switch (ret) {
> +		case 1:
> +			/* We got the lock. */
> +			ret = 0;
> +			goto out_unlock_put_key;
> +		case -EFAULT:
> +			goto uaddr_faulted;
> +		default:
> +			goto out_unlock_put_key;
> +		}
> +	}
> +	/*
> +	 * Only actually queue now that the atomic ops are done:
> +	 */
> +	futex_wait_queue_me(hb, &q, to);
> +
> +	ret = unqueue_me(&q);
> +
> +	/* If we were woken by the owner, try and acquire the lock. */
> +	if (!ret)
> +		goto retry;
> +
> +	ret = -ETIMEDOUT;
> +	if (to && !to->task)
> +		goto out_put_key;
> +
> +	/*
> +	 * We expect signal_pending(current), but we might be the
> +	 * victim of a spurious wakeup as well.
> +         */
> +	ret = -ERESTARTNOINTR;
> +	if (!signal_pending(current)) {
> +		put_futex_key(&q.key);
> +		goto retry;
> +	}
> +
> +	goto out_put_key;
> +
> +out_unlock_put_key:
> +	queue_unlock(&q, hb);
> +
> +out_put_key:
> +	put_futex_key(&q.key);
> +out:
> +	if (to)
> +		destroy_hrtimer_on_stack(&to->timer);
> +	return ret != -EINTR ? ret : -ERESTARTNOINTR;

  Which code sets -EINTR ?

> +
> +uaddr_faulted:
> +	queue_unlock(&q, hb);
> +
> +	ret = fault_in_user_writeable(uaddr);
> +	if (ret)
> +		goto out_put_key;
> +
> +	put_futex_key(&q.key);
> +	goto retry;
> +}
> +
> +
>  /*
>   * Support for robust futexes: the kernel cleans up held futexes at
>   * thread exit time.
> @@ -2623,6 +2830,12 @@ long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout,
>  	case FUTEX_CMP_REQUEUE_PI:
>  		ret = futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
>  		break;
> +	case FUTEX_LOCK_ADAPTIVE:
> +		flags |= FLAGS_ADAPTIVE;
> +	case FUTEX_LOCK:
> +		if (futex_cmpxchg_enabled)
> +			ret = futex_lock(uaddr, flags, val, timeout);
> +		break;
>  	default:
>  		ret = -ENOSYS;
>  	}
> @@ -2641,7 +2854,8 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
>  
>  	if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
>  		      cmd == FUTEX_WAIT_BITSET ||
> -		      cmd == FUTEX_WAIT_REQUEUE_PI)) {
> +		      cmd == FUTEX_WAIT_REQUEUE_PI ||
> +		      cmd == FUTEX_LOCK || cmd == FUTEX_LOCK_ADAPTIVE)) {
>  		if (copy_from_user(&ts, utime, sizeof(ts)) != 0)
>  			return -EFAULT;
>  		if (!timespec_valid(&ts))
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 9ab3cd7..a2dbb5b 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -3818,6 +3818,65 @@ int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
>  out:
>  	return 1;
>  }
> +
> +#ifdef CONFIG_FUTEX
> +#include <linux/futex.h>
> +
> +int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner)
> +{
> +	unsigned int cpu;
> +	struct rq *rq;
> +
> +	if (!sched_feat(OWNER_SPIN))
> +		return 0;
> +
> +#ifdef CONFIG_DEBUG_PAGEALLOC
> +	/*
> +	 * Need to access the cpu field knowing that
> +	 * DEBUG_PAGEALLOC could have unmapped it if
> +	 * the mutex owner just released it and exited.
> +	 */
> +	if (probe_kernel_address(&owner->cpu, cpu))
> +		goto out;
> +#else
> +	cpu = owner->cpu;
> +#endif
> +
> +	/*
> +	 * Even if the access succeeded (likely case),
> +	 * the cpu field may no longer be valid.
> +	 */
> +	if (cpu >= nr_cpumask_bits)
> +		goto out;
> +
> +	/*
> +	 * We need to validate that we can do a
> +	 * get_cpu() and that we have the percpu area.
> +	 */
> +	if (!cpu_online(cpu))
> +		goto out;
> +
> +	rq = cpu_rq(cpu);
> +
> +	for (;;) {
> +		/*
> +		 * Owner changed, break to re-assess state.
> +		 */
> +		if (futex_owner(uaddr) != owner)
> +			break;

  Uurg. It's enough to check whether the TID value changed. No need to
  look up the full thing in every iteration.

> +		/*
> +		 * Is that owner really running on that cpu?
> +		 */
> +		if (task_thread_info(rq->curr) != owner || need_resched())
> +			return 0;
> +
> +		cpu_relax();
> +	}
> +out:
> +	return 1;
> +}
> +#endif
>  #endif

Do we really need all this code ? A simple owner->on_cpu (owner needs
to be the task_struct then) would be sufficient to figure that out,
wouldn't it?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive  spinning
  2010-04-06 16:44                 ` Alan Cox
@ 2010-04-06 17:34                   ` Ulrich Drepper
  2010-04-10 23:35                     ` Alan Cox
  2010-04-06 19:31                   ` Thomas Gleixner
  1 sibling, 1 reply; 62+ messages in thread
From: Ulrich Drepper @ 2010-04-06 17:34 UTC (permalink / raw)
  To: Alan Cox
  Cc: Darren Hart, Peter Zijlstra, Avi Kivity, linux-kernel,
	Thomas Gleixner, Ingo Molnar, Eric Dumazet, Peter W. Morreale,
	Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright

On Tue, Apr 6, 2010 at 09:44, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> That gives you something along the lines of
>
>        runaddr = find_run_flag(lock);
>        do {
>                while(*runaddr == RUNNING) {
>                        if (trylock(lock))
>                                return WHOOPEE;
>                        cpu relax
>                }
>                yield (_on(thread));
>        } while(*runaddr != DEAD);

There still has to be an upper limit in the number of rounds of the
wait loop )some locks are held for a long time) since otherwise CPUs
are unnecessarily long tied up.  And the DEAD case is only for robust
mutex handling.  But in theory I agree.

We already have the set_tid_address syscall.  This could be
generalized with a new syscall which can provide the kernel with more
than one pointer to store "stuff" in: TIDs, scheduling info, etc.

The non-swappable part will be tricky.  One doesn't know how many
threads will be created in a process.  This mechanism shouldn't put an
arbitrary limit in place.  So where to allocate the memory?  Perhaps
it's better to implicitly mark the memory page pointed to by the new
syscall as non-swappable?  This could mean one page per thread...

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 16:54                     ` Alan Cox
@ 2010-04-06 18:15                       ` Thomas Gleixner
  0 siblings, 0 replies; 62+ messages in thread
From: Thomas Gleixner @ 2010-04-06 18:15 UTC (permalink / raw)
  To: Alan Cox
  Cc: Avi Kivity, Darren Hart, Peter Zijlstra, linux-kernel,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

On Tue, 6 Apr 2010, Alan Cox wrote:

> > > IMO the best solution is to spin in userspace while the lock holder is
> > > running, fall into the kernel when it is scheduled out.
> > 
> > That's just not realistic as user space has no idea whether the lock
> > holder is running or not and when it's scheduled out without a syscall :)
> 
> Which is the real problem that wants addressing and can be addressed very
> cheaply. That would bring us up to par with 1970s RTOS environments ;)

Well, 1970's RTOSes had other features as well like preemption disable
mechanisms and other interesting stuff. I hope you're not going to
propose that next to bring us up to par with those :)

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 16:44                 ` Alan Cox
  2010-04-06 17:34                   ` Ulrich Drepper
@ 2010-04-06 19:31                   ` Thomas Gleixner
  2010-04-06 20:02                     ` Ulrich Drepper
  2010-04-07  5:33                     ` Avi Kivity
  1 sibling, 2 replies; 62+ messages in thread
From: Thomas Gleixner @ 2010-04-06 19:31 UTC (permalink / raw)
  To: Alan Cox
  Cc: Darren Hart, Peter Zijlstra, Avi Kivity, linux-kernel,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

On Tue, 6 Apr 2010, Alan Cox wrote:
> > Do you feel some of these situations would also benefit from some kernel 
> > assistance to stop spinning when the owner schedules out? Or are you 
> > saying that there are situations where pure userspace spinlocks will 
> > always be the best option?
> 
> There are cases its the best option - you are assuming for example that
> the owner can get scheduled out. Eg nailing one thread per CPU in some
> specialist high performance situations means they can't.

Fair enough, but that's not the problem Darren is targeting.
 
> > If the latter, I'd think that they would also be situations where 
> > sched_yield() is not used as part of the spin loop. If so, then these 
> > are not our target situations for FUTEX_LOCK_ADAPTIVE, which hopes to 
> > provide a better informed mechanism for making spin or sleep decisions. 
> > If sleeping isn't part of the locking construct implementation, then 
> > FUTEX_LOCK_ADAPTIVE doesn't have much to offer.
> 
> I am unsure about the approach. As Avi says knowing that the lock owner is
> scheduled out allows for far better behaviour. It doesn't need complex
> per lock stuff or per lock notifier entries on pre-empt either.
> 
> A given task is either pre-empted or not and in the normal case of things
> you need this within a process so you've got shared pages anyway. So you
> only need one instance of the 'is thread X pre-empted' bit somewhere in a
> non swappable page.

I fear we might end up with a pinned page per thread to get this
working properly and it restricts the mechanism to process private
locks.

> That gives you something along the lines of
> 
> 	runaddr = find_run_flag(lock);
> 	do {
> 		while(*runaddr == RUNNING) {
> 			if (trylock(lock))
> 				return WHOOPEE;
> 			cpu relax
> 		}
> 		yield (_on(thread));

That would require a new yield_to_target() syscall, which either
blocks the caller when the target thread is not runnable or returns an
error code which signals to go into the slow path.

> 	} while(*runaddr != DEAD);
> 
> 
> which unlike blindly spinning can avoid the worst of any hit on the CPU
> power and would be a bit more guided ?

I doubt that the syscall overhead per se is large enough to justify
all of the above horror. We need to figure out a more efficient way to
do the spinning in the kernel where we have all the necessary
information already. Darren's implementation is suboptimal AFAICT.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive  spinning
  2010-04-06 19:31                   ` Thomas Gleixner
@ 2010-04-06 20:02                     ` Ulrich Drepper
  2010-04-06 23:16                       ` Thomas Gleixner
  2010-04-07  5:33                     ` Avi Kivity
  1 sibling, 1 reply; 62+ messages in thread
From: Ulrich Drepper @ 2010-04-06 20:02 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alan Cox, Darren Hart, Peter Zijlstra, Avi Kivity, linux-kernel,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

On Tue, Apr 6, 2010 at 12:31, Thomas Gleixner <tglx@linutronix.de> wrote:
> We need to figure out a more efficient way to
> do the spinning in the kernel where we have all the necessary
> information already.

Really?  The owner information isn't in general available in the
kernel.  Futex operation doesn't require the value used to be the PID
(or negative of the PID).  That is a dramatic limitation of the
usefulness of futexes.

At userlevel there is access to other fields of the data structure
which can contain the owner information.

I would like to see the method using a per-thread pinned page and an
update of a memory location on scheduling.  For benchmarking at least.
 I agree that a sys_yield_to() syscall would be at the very least
useful as well.  But it's useful for other things already.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-05 22:59       ` Darren Hart
  2010-04-06 13:28         ` Avi Kivity
@ 2010-04-06 21:22         ` Darren Hart
  1 sibling, 0 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-06 21:22 UTC (permalink / raw)
  To: Avi Kivity
  Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

Darren Hart wrote:
> Avi Kivity wrote:
> 
>>> > At 10%
>>>> duty cycle you have 25 waiters behind the lock on average.  I don't 
>>>> think this is realistic, and it means that spinning is invoked only 
>>>> rarely.
>>>
>>> Perhaps some instrumentation is in order, it seems to get invoked 
>>> enough to achieve some 20% increase in lock/unlock iterations. 
>>> Perhaps another metric would be of more value - such as average wait 
>>> time?
>>
>> Why measure an unrealistic workload?
> 
> No argument there, thus my proposal for an alternate configuration below.
> 
>>>> I'd be interested in seeing runs where the average number of waiters 
>>>> is 0.2, 0.5, 1, and 2, corresponding to moderate-to-bad contention.
>>>> 25 average waiters on compute bound code means the application needs 
>>>> to be rewritten, no amount of mutex tweaking will help it.
>>>
>>> Perhaps something NR_CPUS threads would be of more interest? 
>>
>> That seems artificial.
> 
> How so? Several real world applications use one thread per CPU to 
> dispatch work to, wait for events, etc.
> 
>>
>>> At 10% that's about .8 and at 25% the 2 of your upper limit. I could 
>>> add a few more duty-cycle points and make 25% the max. I'll kick that 
>>> off and post the results... probably tomorrow, 10M iterations takes a 
>>> while, but makes the results relatively stable.
>>
>> Thanks.  But why not vary the number of threads as well?
> 
> Absolutely, I don't disagree that all the variables should vary in order 
> to get a complete picture. I'm starting with 8 - it takes several hours 
> to collect the data.

While this might be of less interest after today's discussion, I 
promised to share the results of a run with 8 threads with a wider 
selection of lower duty-cycles. The results are very poor for adaptive 
and worse for aas (multiple spinners) compared to normal FUTEX_LOCK. As 
Thomas and Peter have pointed out, the implementation is sub-optimal. 
Before abandoning this approach I will see if I can find the bottlenecks 
and simplify the kernel side of things. My impression is that I am doing 
a lot more work in the kernel, especially in the adaptive loop, than is 
really necessary.

Both the 8 and 256 Thread plots can be viewed here:

http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v4/

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 20:02                     ` Ulrich Drepper
@ 2010-04-06 23:16                       ` Thomas Gleixner
  2010-04-06 23:36                         ` Darren Hart
  2010-04-07  6:08                         ` drepper
  0 siblings, 2 replies; 62+ messages in thread
From: Thomas Gleixner @ 2010-04-06 23:16 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Alan Cox, Darren Hart, Peter Zijlstra, Avi Kivity, linux-kernel,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

On Tue, 6 Apr 2010, Ulrich Drepper wrote:

> On Tue, Apr 6, 2010 at 12:31, Thomas Gleixner <tglx@linutronix.de> wrote:
> > We need to figure out a more efficient way to
> > do the spinning in the kernel where we have all the necessary
> > information already.
> 
> Really?  The owner information isn't in general available in the
> kernel.  Futex operation doesn't require the value used to be the PID
> (or negative of the PID).  That is a dramatic limitation of the
> usefulness of futexes.

I know that you can do any weird stuff with the futex value, but I
don't see the "dramatic" limitation. Care to elaborate ?

> At userlevel there is access to other fields of the data structure
> which can contain the owner information.
> 
> I would like to see the method using a per-thread pinned page and an
> update of a memory location on scheduling.  For benchmarking at least.

The per thread pinned page would be unconditional, right ?

I agree that benchmarking would be interesting, but OTOH I fear that
we open up a huge can of worms with exposing scheduler details and the
related necessary syscalls like sys_yield_to: User space thread
management/scheduling comes to my mind and I hope we agree that we do
not want to revisit that.

>  I agree that a sys_yield_to() syscall would be at the very least
> useful as well.  But it's useful for other things already.

Useful for what ?

What are the exact semantics of such a syscall ? 

How does that fit into the various scheduling constraints ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 23:16                       ` Thomas Gleixner
@ 2010-04-06 23:36                         ` Darren Hart
  2010-04-07  6:08                         ` drepper
  1 sibling, 0 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-06 23:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ulrich Drepper, Alan Cox, Peter Zijlstra, Avi Kivity,
	linux-kernel, Ingo Molnar, Eric Dumazet, Peter W. Morreale,
	Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright

Thomas Gleixner wrote:
> On Tue, 6 Apr 2010, Ulrich Drepper wrote:
> 
>> On Tue, Apr 6, 2010 at 12:31, Thomas Gleixner <tglx@linutronix.de> wrote:
>>> We need to figure out a more efficient way to
>>> do the spinning in the kernel where we have all the necessary
>>> information already.
>> Really?  The owner information isn't in general available in the
>> kernel.  Futex operation doesn't require the value used to be the PID
>> (or negative of the PID).  That is a dramatic limitation of the
>> usefulness of futexes.
> 
> I know that you can do any weird stuff with the futex value, but I
> don't see the "dramatic" limitation. Care to elaborate ?
> 
>> At userlevel there is access to other fields of the data structure
>> which can contain the owner information.
>>
>> I would like to see the method using a per-thread pinned page and an
>> update of a memory location on scheduling.  For benchmarking at least.
> 
> The per thread pinned page would be unconditional, right ?
> 
> I agree that benchmarking would be interesting, but OTOH I fear that
> we open up a huge can of worms with exposing scheduler details and the
> related necessary syscalls like sys_yield_to: User space thread
> management/scheduling comes to my mind and I hope we agree that we do
> not want to revisit that.
> 
>>  I agree that a sys_yield_to() syscall would be at the very least
>> useful as well.  But it's useful for other things already.
> 
> Useful for what ?
> 
> What are the exact semantics of such a syscall ? 
> 
> How does that fit into the various scheduling constraints ?


I believe this comes back to the discussions of a directed yield. The 
idea being that a thread yields its remaining timeslice to a thread of 
it's choosing - usually because the target thread holds a resource the 
yielding thread needs access to. This makes the yield more explicit so 
the yielding thread is more likely to get some benefit out of yielding.

I believe the arguments would be either a TID or a thread group - 
however that is specified. I believe the KVM guys would like to see 
something like this as well - which might be the "other things" referred 
to above.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 19:31                   ` Thomas Gleixner
  2010-04-06 20:02                     ` Ulrich Drepper
@ 2010-04-07  5:33                     ` Avi Kivity
  1 sibling, 0 replies; 62+ messages in thread
From: Avi Kivity @ 2010-04-07  5:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alan Cox, Darren Hart, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Eric Dumazet, Peter W. Morreale, Rik van Riel, Steven Rostedt,
	Gregory Haskins, Sven-Thorsten Dietrich, Chris Mason,
	John Cooper, Chris Wright

On 04/06/2010 10:31 PM, Thomas Gleixner wrote:
>
>> That gives you something along the lines of
>>
>> 	runaddr = find_run_flag(lock);
>> 	do {
>> 		while(*runaddr == RUNNING) {
>> 			if (trylock(lock))
>> 				return WHOOPEE;
>> 			cpu relax
>> 		}
>> 		yield (_on(thread));
>>      
> That would require a new yield_to_target() syscall, which either
> blocks the caller when the target thread is not runnable or returns an
> error code which signals to go into the slow path.
>    

Note, directed yield is something that kvm wants for its own ends.  As 
an internal kernel api, not a syscall, but it's apparently useful.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 23:16                       ` Thomas Gleixner
  2010-04-06 23:36                         ` Darren Hart
@ 2010-04-07  6:08                         ` drepper
  2010-04-08  3:41                           ` Darren Hart
  1 sibling, 1 reply; 62+ messages in thread
From: drepper @ 2010-04-07  6:08 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Alan Cox, Darren Hart, Peter Zijlstra, Avi Kivity, linux-kernel,
	Ingo Molnar, Eric Dumazet, Peter W. Morreale, Rik van Riel,
	Steven Rostedt, Gregory Haskins, Sven-Thorsten Dietrich,
	Chris Mason, John Cooper, Chris Wright

[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]

On Tue, Apr 6, 2010 at 16:16, Thomas Gleixner <tglx@linutronix.de> wrote:
> I know that you can do any weird stuff with the futex value, but I
> don't see the "dramatic" limitation. Care to elaborate ?

If we have to fill in the PID we can represent only three states in a futex: 0, PID, -PID.  Today we can represent 2^32 states.  Quite a difference.

> The per thread pinned page would be unconditional, right ?

Only if the process would be using these adaptive mutexes.  It could be conditional.

> I agree that benchmarking would be interesting, but OTOH I fear that
> we open up a huge can of worms with exposing scheduler details and the
> related necessary syscalls like sys_yield_to: User space thread
> management/scheduling comes to my mind and I hope we agree that we do
> not want to revisit that.

I'm not sure.  We never got to the bottom of this.  Why are these details which should not be disclosed?  It's clear that there is descheduling and the sys_yield_to syscall would require nothing to happen but indicate to the kernel execution dependencies the kernel cannot necessarily discover on its own, at least not efficiently.

> Useful for what ?

We already have places where we could spin a bit using sys_yield_to because be know what we are waiting on.

> What are the exact semantics of such a syscall ?

It gives the kernel the hint that the current thread is willing to hand over the remaining time on the timeslice to the target thread.  This target thread, if sleeping, can immediately make progress.  Yes, this might mean moving the target thread to the core executing yielding thread.  Perhaps this doesn't make sense in some situations.  In this case the syscall could be a no-op, perhaps indicating this in the return value.

> How does that fit into the various scheduling constraints ?

I don't know enough about all the constraints.  As I said, it could be a hint.  If the constraints forbid the timeslice transfer it need not happen.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 272 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-06 16:20                     ` Avi Kivity
@ 2010-04-07  6:18                       ` john cooper
  2010-04-08  3:33                         ` Darren Hart
  0 siblings, 1 reply; 62+ messages in thread
From: john cooper @ 2010-04-07  6:18 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Thomas Gleixner, Darren Hart, Alan Cox, Peter Zijlstra,
	linux-kernel, Ingo Molnar, Eric Dumazet, Peter W. Morreale,
	Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, Chris Wright, john cooper

Avi Kivity wrote:
> On 04/06/2010 07:14 PM, Thomas Gleixner wrote:
>>
>>> IMO the best solution is to spin in userspace while the lock holder is
>>> running, fall into the kernel when it is scheduled out.
>>>      
>> That's just not realistic as user space has no idea whether the lock
>> holder is running or not and when it's scheduled out without a syscall :)
>>    
> 
> The kernel could easily expose this information by writing into the
> thread's TLS area.
> 
> So:
> 
> - the kernel maintains a current_cpu field in a thread's tls
> - lock() atomically writes a pointer to the current thread's current_cpu
> when acquiring
> - the kernel writes an invalid value to current_cpu when switching out
> - a contended lock() retrieves the current_cpu pointer, and spins as
> long as it is a valid cpu

There are certainly details to sort through in the packaging
of the mechanism but conceptually that should do the job.
So here the application has chosen a blocking lock as being
the optimal synchronization operation and we're detecting a
scenario where we can factor out the aggregate overhead of two
context switch operations.

There is also the case where the application requires a
polled lock with the rational being the assumed lock
hold/wait time is substantially less than the above context
switch overhead.  But here we're otherwise completely
open to indiscriminate scheduling preemption even though
we may be holding a userland lock.

The adaptive mutex above is an optimization beyond what
is normally expected for the associated model.  The preemption
of a polled lock OTOH can easily inflict latency several orders
of magnitude beyond what is expected in that model.  Two use
cases exist here which IMO aren't related except for the latter
unintentionally degenerating into the former.

-john

-- 
john.cooper@third-harmonic.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-04-06 16:55   ` Thomas Gleixner
@ 2010-04-07 17:26     ` Darren Hart
  2010-04-07 19:59       ` Thomas Gleixner
  0 siblings, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-07 17:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Peter Zijlstra

Thomas Gleixner wrote:
> On Mon, 5 Apr 2010, Darren Hart wrote:
>> +#ifdef CONFIG_SMP
>> +struct thread_info *futex_owner(u32 __user *uaddr)
>> +{
>> +	struct thread_info *ti = NULL;
>> +	struct task_struct *p;
>> +	pid_t pid;
>> +	u32 uval;
>> +
>> +	if (get_futex_value_locked(&uval, uaddr))
>> +		return NULL;
>> +
>> +	pid = uval & FUTEX_TID_MASK;
>> +	rcu_read_lock();
>> +	p = find_task_by_vpid(pid);
>> +	if (p) {
>> +		const struct cred *cred, *pcred;
>> +
>> +		cred = current_cred();
>> +		pcred = __task_cred(p);
>> +		if (cred->euid == pcred->euid ||
>> +		    cred->euid == pcred->uid)
>> +			ti = task_thread_info(p);
> 
>   We want a get_task_struct() here, don't we ?


It would appear so. We access owner->cpu in futex_spin_on_owner. Done.


>> +	}
>> +	rcu_read_unlock();
>> +
>> +	return ti;
>> +}
>> +#endif
>> +
> 
>> +/**
>> + * trylock_futex_adaptive() - Try to acquire the futex lock in a busy loop
>> + * @uaddr: the futex user address
>> + *
>> + * Try to acquire a futex lock in a loop until the owner changes or the owner
>> + * is descheduled. To lock the futex, set the value to the current TID.
>> + *
>> + * Returns:
>> + *  0 - Gave up, lock not acquired
>> + *  1 - Futex lock acquired
>> + * <0 - On error
>> + */
>> +static int trylock_futex_adaptive(u32 __user *uaddr)
>> +{
>> +	int ret = 0;
>> +	u32 curval;
>> +
>> +	for (;;) {
>> +		struct thread_info *owner;
>> +
>> +		owner = futex_owner(uaddr);
>> +		if (owner && futex_spin_on_owner(uaddr, owner))
>> +			break;
>> +
>> +		/*
>> +		 * Try to acquire the lock. If we acquire it or error out,
>> +		 * break to enable preemption and handle ret outside the loop.
>> +		 */
>> +		curval = cmpxchg_futex_value_locked(uaddr, 0,
>> +		                                    task_pid_vnr(current));
>> +
>> +		if (curval == 0) {
>> +			ret = 1;
>> +			break;
>> +		}
>> +
>> +		if (unlikely(curval == -EFAULT)) {
>> +			ret = -EFAULT;
>> +			break;
>> +		}
>> +
>> +		/*
>> +		 * Futex locks manage the owner atomically. We are not in
>> +		 * danger of deadlock by preempting a pending owner. Abort the
>> +		 * loop if our time slice has expired.  RT Threads can spin
>> +		 * until they preempt the owner, at which point they will break
>> +		 * out of the loop anyway.
>> +		 */
>> +		if (need_resched())
>> +			break;
>> +
>> +		cpu_relax();
>> +	}
>> +	return ret;
>> +}
> 
> 
> Hmm. The order is weird. Why don't you do that simpler ?
> 
> Get the uval, the tid and the thread_info pointer outside of the
> loop. Also task_pid_vnr(current) just needs a one time lookup.

Eeek. Having the owner in the loop is a good way to negate the benefits
of adaptive spinning by spinning forever (unlikely, but it could
certainly spin across multiple owners). Nice catch.

As for the uval.... I'm not sure what you mean. You get curval below
inside the loop, and there is no "uval" in the my version of the code.

As for the order, I had put the initial spin prior to the cmpxchg to
avoid doing too many cmpxchg's in a row as they are rather expensive.
However, since this is (now) the first opportunity to do try and acquire
the lock atomically after entering the futex syscall, I think you're
right, it should be the first thing in the loop.

> 
> change the loop to do:
> 
>        for (;;) {
>        	   curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
>   
> 	   if (!curval)
> 	      return 1;

Single return point makes instrumentation so much easier. Unless folks
_really_ object, I'll leave it as is until we're closer to merging.

> 	   if ((curval & FUTEX_TID_MASK) != ownertid) {
> 	      ownertid = curval & FUTEX_TID_MASK;
> 	      owner = update_owner(ownertid);
> 	   }


Hrm... at this point the owner has changed... so we should break and go
to sleep, not update the owner and start spinning again. The
futex_spin_on_owner() will detect this and abort, so I'm not seeing the
purpose of the above if() block.

...

>> +static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time)
>> +{
>> +	struct hrtimer_sleeper timeout, *to = NULL;
>> +	struct futex_hash_bucket *hb;
>> +	struct futex_q q = FUTEX_Q_INIT;
>> +	int ret = 0;
>> +
>> +	if (refill_pi_state_cache())
>> +		return -ENOMEM;
>> +
>> +	if (time) {
>> +		to = &timeout;
>> +		hrtimer_init_on_stack(&to->timer, CLOCK_REALTIME,
>> +				      HRTIMER_MODE_ABS);
> 
>   Please make that like the WAIT_BITSET one, which can select the
>   clock and defaults to MONOTONIC.


Done.

> 
>> +		hrtimer_init_sleeper(to, current);
>> +		hrtimer_set_expires(&to->timer, *time);
>> +	}
> 
>   Why setup all this _before_ trying the adaptive spin ?


I placed the retry: label above the adaptive spin loop. This way if we 
wake a task and the lock is "stolen" it doesn't just go right back to 
sleep. This should aid in fairness and also performance in less 
contended cases. I didn't think it was worth a "if (first_time_through 
&& time)" sort of block to be able to setup the timer after the spin loop.


>> +retry:
>> +#ifdef CONFIG_SMP
>> +	if (flags & FLAGS_ADAPTIVE) {
> 
>   Why do we want that non adaptive version at all ?


We don't. I'm using it here simply to measure the impact of adaptive 
spinning. Eventually all that will matter is how it stacks up against a 
raw futex_wait/wake mutex implementation like 
pthread_mutex_lock/unlock(). But in the early stages it's nice to be 
able eliminate all other differences other than adaptive spinning.

...

>> +	if (to)
>> +		destroy_hrtimer_on_stack(&to->timer);
>> +	return ret != -EINTR ? ret : -ERESTARTNOINTR;
> 
>   Which code sets -EINTR ?


Cruft left over from the futex_lock_pi() roots. Removed.

...

>> diff --git a/kernel/sched.c b/kernel/sched.c
>> index 9ab3cd7..a2dbb5b 100644
>> --- a/kernel/sched.c
>> +++ b/kernel/sched.c
>> @@ -3818,6 +3818,65 @@ int mutex_spin_on_owner(struct mutex *lock, struct thread_info *owner)
>>  out:
>>  	return 1;
>>  }
>> +
>> +#ifdef CONFIG_FUTEX
>> +#include <linux/futex.h>
>> +
>> +int futex_spin_on_owner(u32 __user *uaddr, struct thread_info *owner)
>> +{
>> +	unsigned int cpu;
>> +	struct rq *rq;
>> +
>> +	if (!sched_feat(OWNER_SPIN))
>> +		return 0;
>> +
>> +#ifdef CONFIG_DEBUG_PAGEALLOC
>> +	/*
>> +	 * Need to access the cpu field knowing that
>> +	 * DEBUG_PAGEALLOC could have unmapped it if
>> +	 * the mutex owner just released it and exited.
>> +	 */
>> +	if (probe_kernel_address(&owner->cpu, cpu))
>> +		goto out;
>> +#else
>> +	cpu = owner->cpu;
>> +#endif
>> +
>> +	/*
>> +	 * Even if the access succeeded (likely case),
>> +	 * the cpu field may no longer be valid.
>> +	 */
>> +	if (cpu >= nr_cpumask_bits)
>> +		goto out;
>> +
>> +	/*
>> +	 * We need to validate that we can do a
>> +	 * get_cpu() and that we have the percpu area.
>> +	 */
>> +	if (!cpu_online(cpu))
>> +		goto out;
>> +
>> +	rq = cpu_rq(cpu);
>> +
>> +	for (;;) {
>> +		/*
>> +		 * Owner changed, break to re-assess state.
>> +		 */
>> +		if (futex_owner(uaddr) != owner)
>> +			break;
> 
>   Uurg. It's enough to check whether the TID value changed. No need to
>   look up the full thing in every iteration.


OK, we can skip the euid, uid credentials checking by doing a get_user() 
instead.


> 
>> +		/*
>> +		 * Is that owner really running on that cpu?
>> +		 */
>> +		if (task_thread_info(rq->curr) != owner || need_resched())
>> +			return 0;
>> +
>> +		cpu_relax();
>> +	}
>> +out:
>> +	return 1;
>> +}
>> +#endif
>>  #endif
> 
> Do we really need all this code ? A simple owner->on_cpu (owner needs
> to be the task_struct then) would be sufficient to figure that out,
> wouldn't it?

As Peter pointed out in IRC, p->oncpu isn't generic. I'll go trolling 
through the mutex_spin_on_owner() discussions to see if I can determine 
why that's the case.

Thank you for the review.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] futex: handle timeout inside adaptive lock spin
  2010-04-06  8:27   ` Thomas Gleixner
@ 2010-04-07 17:31     ` Darren Hart
  2010-04-07 18:44       ` Gregory Haskins
  0 siblings, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-07 17:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity

Thomas Gleixner wrote:
> On Mon, 5 Apr 2010, Darren Hart wrote:
> 
>> Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
>> ---
>>  kernel/futex.c |   24 +++++++++++++++++++++---
>>  1 files changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/futex.c b/kernel/futex.c
>> index c33ac2a..af61dcd 100644
>> --- a/kernel/futex.c
>> +++ b/kernel/futex.c
>> @@ -2385,6 +2385,7 @@ out:
>>  /**
>>   * trylock_futex_adaptive() - Try to acquire the futex lock in a busy loop
>>   * @uaddr: the futex user address
>> + * @timeout: absolute timeout or NULL if none
>>   *
>>   * Try to acquire a futex lock in a loop until the owner changes or the owner
>>   * is descheduled. To lock the futex, set the value to the current TID.
>> @@ -2394,10 +2395,11 @@ out:
>>   *  1 - Futex lock acquired
>>   * <0 - On error
>>   */
>> -static int trylock_futex_adaptive(u32 __user *uaddr)
>> +static int trylock_futex_adaptive(u32 __user *uaddr, ktime_t *timeout)
>>  {
>>  	int ret = 0;
>>  	u32 curval;
>> +	ktime_t now;
>>  
>>  	for (;;) {
>>  		struct thread_info *owner;
>> @@ -2433,6 +2435,22 @@ static int trylock_futex_adaptive(u32 __user *uaddr)
>>  		if (need_resched())
>>  			break;
>>  
>> +		if (timeout) {
>> +			now = ktime_get();
> 
>   Hmm. Calling that in every iteration might hurt especially on non
>   TSC systems, but well...

I haven't come across a better alternative since arming the timer before 
setting TASK_INTERRUPTIBLE isn't appropriate.

> 
>> +/* FIXME: consider creating ktime_less_than(lhs, rhs) */
> 
>   No need. The .tv64 comparison works in both cases. :)

Ah, for some reason I was thinking that was only the case if 
CONFIG_KTIME_SCALAR was set. Very nice, thanks.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] futex: handle timeout inside adaptive lock spin
  2010-04-07 17:31     ` Darren Hart
@ 2010-04-07 18:44       ` Gregory Haskins
  2010-04-07 23:15         ` Darren Hart
  0 siblings, 1 reply; 62+ messages in thread
From: Gregory Haskins @ 2010-04-07 18:44 UTC (permalink / raw)
  To: Thomas Gleixner, Darren Hart
  Cc: Ingo Molnar, Eric Dumazet, Steven Rostedt, Peter Zijlstra,
	Peter Morreale, Sven Dietrich, Chris Mason, Avi Kivity,
	Rik van Riel, Chris Wright, John Cooper, linux-kernel

>>> On 4/7/2010 at 01:31 PM, in message <4BBCC174.7020409@us.ibm.com>, Darren Hart
<dvhltc@us.ibm.com> wrote: 
> Thomas Gleixner wrote:
>> On Mon, 5 Apr 2010, Darren Hart wrote:
>> 
>>> Signed-off-by: Darren Hart <dvhltc@us.ibm.com>

>>> +		if (timeout) {
>>> +			now = ktime_get();
>> 
>>   Hmm. Calling that in every iteration might hurt especially on non
>>   TSC systems, but well...
> 
> I haven't come across a better alternative since arming the timer before 
> setting TASK_INTERRUPTIBLE isn't appropriate.

Hey Darren,

I remember we tried something similar in early versions of the adaptive locks and this was definitely bad. :(

It ended up putting so much contention on the xtime_lock (IIRC) that it resulted in adaptive locks hurting overall performance verses not using adaptive at all.  Alternative mechanisms employed a hybrid where the inner loops used a pseudo calibrated counter loop, and the outer loop checks periodically against a real clock.  It all plays into "you are burning CPU cycles anyway, so might as well put them to use" theory.  Hacky, yes, but it did relieve the pressure on the time subsystem locks and freed up a _lot_ of performance.  Without this, the concept of timeouts+adaptive was unworkable.  I think Steven ultimately rejected the timeout related patches outright when he merged adaptive to -rt, but I think Sven pulled them into SLERT if you would like a potential code reference to a working solution.

-Greg

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-04-07 17:26     ` Darren Hart
@ 2010-04-07 19:59       ` Thomas Gleixner
  2010-04-08  3:25         ` Darren Hart
  0 siblings, 1 reply; 62+ messages in thread
From: Thomas Gleixner @ 2010-04-07 19:59 UTC (permalink / raw)
  To: Darren Hart
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Peter Zijlstra

On Wed, 7 Apr 2010, Darren Hart wrote:
> Thomas Gleixner wrote:
> > On Mon, 5 Apr 2010, Darren Hart wrote:
> > Hmm. The order is weird. Why don't you do that simpler ?
> > 
> > Get the uval, the tid and the thread_info pointer outside of the
> > loop. Also task_pid_vnr(current) just needs a one time lookup.
> 
> Eeek. Having the owner in the loop is a good way to negate the benefits
> of adaptive spinning by spinning forever (unlikely, but it could
> certainly spin across multiple owners). Nice catch.
> 
> As for the uval.... I'm not sure what you mean. You get curval below
> inside the loop, and there is no "uval" in the my version of the code.

Well, you need a first time lookup of owner and ownertid for which you
need the user space value (uval), but thinking more about it it's not
even necessary. Just initialize ownertid to 0 so it will drop into the
lookup code when we did not acquire the futex in the cmpxchg.
 
> As for the order, I had put the initial spin prior to the cmpxchg to
> avoid doing too many cmpxchg's in a row as they are rather expensive.
> However, since this is (now) the first opportunity to do try and acquire
> the lock atomically after entering the futex syscall, I think you're
> right, it should be the first thing in the loop.
> 
> > 
> > change the loop to do:
> > 
> >        for (;;) {
> >        	   curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
> >   	   if (!curval)
> > 	      return 1;
> 
> Single return point makes instrumentation so much easier. Unless folks
> _really_ object, I'll leave it as is until we're closer to merging.

I don't care either way. That was just example code.
 
> > 	   if ((curval & FUTEX_TID_MASK) != ownertid) {
> > 	      ownertid = curval & FUTEX_TID_MASK;
> > 	      owner = update_owner(ownertid);
> > 	   }
> 
> 
> Hrm... at this point the owner has changed... so we should break and go
> to sleep, not update the owner and start spinning again. The
> futex_spin_on_owner() will detect this and abort, so I'm not seeing the
> purpose of the above if() block.

Why ? If the owner has changed and the new owner is running on another
cpu then why not spin further ?

> > > +		hrtimer_init_sleeper(to, current);
> > > +		hrtimer_set_expires(&to->timer, *time);
> > > +	}
> > 
> >   Why setup all this _before_ trying the adaptive spin ?
> 
> 
> I placed the retry: label above the adaptive spin loop. This way if we wake a
> task and the lock is "stolen" it doesn't just go right back to sleep. This
> should aid in fairness and also performance in less contended cases. I didn't
> think it was worth a "if (first_time_through && time)" sort of block to be
> able to setup the timer after the spin loop.

Hmm, ok.
 
> > 
> > Do we really need all this code ? A simple owner->on_cpu (owner needs
> > to be the task_struct then) would be sufficient to figure that out,
> > wouldn't it?
> 
> As Peter pointed out in IRC, p->oncpu isn't generic. I'll go trolling through
> the mutex_spin_on_owner() discussions to see if I can determine why that's the
> case.

AFAICT p->oncpu is the correct thing to use when CONFIG_SMP=y. All it
needs is a simple accessor function and you can keep all the futex
cruft in futex.c where it belongs.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 5/6] futex: handle timeout inside adaptive lock  spin
  2010-04-07 18:44       ` Gregory Haskins
@ 2010-04-07 23:15         ` Darren Hart
  0 siblings, 0 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-07 23:15 UTC (permalink / raw)
  To: Gregory Haskins
  Cc: Thomas Gleixner, Ingo Molnar, Eric Dumazet, Steven Rostedt,
	Peter Zijlstra, Peter Morreale, Sven Dietrich, Chris Mason,
	Avi Kivity, Rik van Riel, Chris Wright, John Cooper,
	linux-kernel

Gregory Haskins wrote:
 >>>> On 4/7/2010 at 01:31 PM, in message <4BBCC174.7020409@us.ibm.com>, 
Darren Hart
 > <dvhltc@us.ibm.com> wrote:
 >> Thomas Gleixner wrote:
 >>> On Mon, 5 Apr 2010, Darren Hart wrote:
 >>>
 >>>> Signed-off-by: Darren Hart <dvhltc@us.ibm.com>
 >
 >>>> +		if (timeout) {
 >>>> +			now = ktime_get();
 >>>   Hmm. Calling that in every iteration might hurt especially on non
 >>>   TSC systems, but well...
 >> I haven't come across a better alternative since arming the timer 
before
 >> setting TASK_INTERRUPTIBLE isn't appropriate.
 >
 > Hey Darren,
 >
 > I remember we tried something similar in early versions of the
 > adaptive locks and this was definitely bad. :(
 >
 > It ended up putting so much contention on the xtime_lock (IIRC) that
 > it resulted in adaptive locks hurting overall performance verses not
 > using adaptive at all.  Alternative mechanisms employed a hybrid where
 > the inner loops used a pseudo calibrated counter loop, and the outer
 > loop checks periodically against a real clock.  It all plays into "you
 > are burning CPU cycles anyway, so might as well put them to use"
 > theory.  Hacky, yes, but it did relieve the pressure on the time
 > subsystem locks and freed up a _lot_ of performance.  Without this,
 > the concept of timeouts+adaptive was unworkable.  I think Steven
 > ultimately rejected the timeout related patches outright when he
 > merged adaptive to -rt, but I think Sven pulled them into SLERT if you
 > would like a potential code reference to a working solution.
 >

Hi Greg,

Thanks for the info! I haven't tested with timeouts yet as I'm still 
struggling to get decent performance out of just plain old adaptive 
right now. I'll keep that in mind when I get to that, and yeah, if the 
plan is to burn CPU cycles, might as well do something constructive :-)

I do feel the timeouts are a necessary feature. Interruptibility may be 
as well, but I'm going to ignore it for the time being...

--
Darren

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-04-07 19:59       ` Thomas Gleixner
@ 2010-04-08  3:25         ` Darren Hart
  2010-04-08 23:10           ` Peter W. Morreale
  0 siblings, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-08  3:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Peter Zijlstra

Thomas Gleixner wrote:
> On Wed, 7 Apr 2010, Darren Hart wrote:
>> Thomas Gleixner wrote:
>>> On Mon, 5 Apr 2010, Darren Hart wrote:
>>> Hmm. The order is weird. Why don't you do that simpler ?
>>>
>>> Get the uval, the tid and the thread_info pointer outside of the
>>> loop. Also task_pid_vnr(current) just needs a one time lookup.
>> Eeek. Having the owner in the loop is a good way to negate the benefits
>> of adaptive spinning by spinning forever (unlikely, but it could
>> certainly spin across multiple owners). Nice catch.
>>
>> As for the uval.... I'm not sure what you mean. You get curval below
>> inside the loop, and there is no "uval" in the my version of the code.
> 
> Well, you need a first time lookup of owner and ownertid for which you
> need the user space value (uval),
> but thinking more about it it's not
> even necessary. Just initialize ownertid to 0 so it will drop into the
> lookup code when we did not acquire the futex in the cmpxchg.

No need for ownertid at all really. The cmpxchg always tries to go from 
0 to curtid. I've pushed the futex_owner() call outside the loop for a 
one time lookup.

> 
>> As for the order, I had put the initial spin prior to the cmpxchg to
>> avoid doing too many cmpxchg's in a row as they are rather expensive.
>> However, since this is (now) the first opportunity to do try and acquire
>> the lock atomically after entering the futex syscall, I think you're
>> right, it should be the first thing in the loop.
>>
>>> change the loop to do:
>>>
>>>        for (;;) {
>>>        	   curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
>>>   	   if (!curval)
>>> 	      return 1;
>> Single return point makes instrumentation so much easier. Unless folks
>> _really_ object, I'll leave it as is until we're closer to merging.
> 
> I don't care either way. That was just example code.
> 
>>> 	   if ((curval & FUTEX_TID_MASK) != ownertid) {
>>> 	      ownertid = curval & FUTEX_TID_MASK;
>>> 	      owner = update_owner(ownertid);
>>> 	   }
>>
>> Hrm... at this point the owner has changed... so we should break and go
>> to sleep, not update the owner and start spinning again. The
>> futex_spin_on_owner() will detect this and abort, so I'm not seeing the
>> purpose of the above if() block.
> 
> Why ? If the owner has changed and the new owner is running on another
> cpu then why not spin further ?

That's an interesting question, and I'm not sure what the right answer 
is. The current approach of the adaptive spinning in the kernel is to 
spin until the owner changes or deschedules, then stop and block. The 
idea is that if you didn't get the lock before the owner changed, you 
aren't going to get it in a very short period of time (you have at least 
an entire critical section to wait through plus whatever time you've 
already spent spinning). However, blocking just so another task can spin 
doesn't really make sense either, and makes the lock less fair than it 
could otherwise be.

My goal in starting this is to provide a more intelligent mechanism than 
sched_yield() for userspace to use to determine when to spin and when to 
sleep. The current implementation allows for spinning up until the owner 
changes, deschedules, or the timeslice expires. I believe these are much 
better than spinning for some fixed number of cycles and then yield for 
some unpredictable amount of time until CFS decides to schedule you back in.

Still, the criteria for breaking the spin are something that needs more 
eyes, and more numbers before I can be confident in any approach.

> 
>>>> +		hrtimer_init_sleeper(to, current);
>>>> +		hrtimer_set_expires(&to->timer, *time);
>>>> +	}
>>>   Why setup all this _before_ trying the adaptive spin ?
>>
>> I placed the retry: label above the adaptive spin loop. This way if we wake a
>> task and the lock is "stolen" it doesn't just go right back to sleep. This
>> should aid in fairness and also performance in less contended cases. I didn't
>> think it was worth a "if (first_time_through && time)" sort of block to be
>> able to setup the timer after the spin loop.
> 
> Hmm, ok.
> 
>>> Do we really need all this code ? A simple owner->on_cpu (owner needs
>>> to be the task_struct then) would be sufficient to figure that out,
>>> wouldn't it?
>> As Peter pointed out in IRC, p->oncpu isn't generic. I'll go trolling through
>> the mutex_spin_on_owner() discussions to see if I can determine why that's the
>> case.
> 
> AFAICT p->oncpu is the correct thing to use when CONFIG_SMP=y. All it
> needs is a simple accessor function and you can keep all the futex
> cruft in futex.c where it belongs.

Noted.

Thanks,
-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-07  6:18                       ` john cooper
@ 2010-04-08  3:33                         ` Darren Hart
  2010-04-09  5:52                           ` john cooper
  0 siblings, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-08  3:33 UTC (permalink / raw)
  To: john cooper
  Cc: Avi Kivity, Thomas Gleixner, Alan Cox, Peter Zijlstra,
	linux-kernel, Ingo Molnar, Eric Dumazet, Peter W. Morreale,
	Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, Chris Wright, john cooper

john cooper wrote:
> Avi Kivity wrote:
>> On 04/06/2010 07:14 PM, Thomas Gleixner wrote:
>>>> IMO the best solution is to spin in userspace while the lock holder is
>>>> running, fall into the kernel when it is scheduled out.
>>>>      
>>> That's just not realistic as user space has no idea whether the lock
>>> holder is running or not and when it's scheduled out without a syscall :)
>>>    
>> The kernel could easily expose this information by writing into the
>> thread's TLS area.
>>
>> So:
>>
>> - the kernel maintains a current_cpu field in a thread's tls
>> - lock() atomically writes a pointer to the current thread's current_cpu
>> when acquiring
>> - the kernel writes an invalid value to current_cpu when switching out
>> - a contended lock() retrieves the current_cpu pointer, and spins as
>> long as it is a valid cpu
> 
> There are certainly details to sort through in the packaging
> of the mechanism but conceptually that should do the job.
> So here the application has chosen a blocking lock as being
> the optimal synchronization operation and we're detecting a
> scenario where we can factor out the aggregate overhead of two
> context switch operations.

I didn't intend to change the behavior of an existing blocking call with 
adaptive spinning if that is what you are getting at here. Initially 
there would be a new futex op, something like FUTEX_LOCK_ADAPTIVE or 
maybe just FUTEX_WAIT_ADAPTIVE. Applications can use this directly to 
implement adaptive spinlocks. Ideally glibc would make use of this via 
either the existing adaptive spinning NP API or via a new one. Before we 
even go there, we need to see if this can provide a real benefit.

> 
> There is also the case where the application requires a
> polled lock with the rational being the assumed lock
> hold/wait time is substantially less than the above context
> switch overhead.

Polled lock == userspace spinlock?

> But here we're otherwise completely
> open to indiscriminate scheduling preemption even though
> we may be holding a userland lock.

That's true with any userland lock.

> The adaptive mutex above is an optimization beyond what
> is normally expected for the associated model.  The preemption
> of a polled lock OTOH can easily inflict latency several orders
> of magnitude beyond what is expected in that model.  Two use
> cases exist here which IMO aren't related except for the latter
> unintentionally degenerating into the former.

Again, my intention is not to replace any existing functionality, so 
applications would have to explicitly request this behavior.

If I'm missing your point, please elaborate.

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-07  6:08                         ` drepper
@ 2010-04-08  3:41                           ` Darren Hart
  2010-04-08  4:29                             ` drepper
  0 siblings, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-08  3:41 UTC (permalink / raw)
  To: drepper
  Cc: Thomas Gleixner, Alan Cox, Peter Zijlstra, Avi Kivity,
	linux-kernel, Ingo Molnar, Eric Dumazet, Peter W. Morreale,
	Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright

drepper@gmail.com wrote:
> On Tue, Apr 6, 2010 at 16:16, Thomas Gleixner <tglx@linutronix.de> wrote:
>> I know that you can do any weird stuff with the futex value, but I
>> don't see the "dramatic" limitation. Care to elaborate ?
> 
> If we have to fill in the PID we can represent only three states in a 
> futex: 0, PID, -PID.  Today we can represent 2^32 states.  Quite a 
> difference.

For general futexes sure, but not for robust or PI mutexes. Having 
adaptive futexes be based on the TID|WAITERS_FLAG policy certainly isn't 
breaking new ground, and is consistent with the other kernel-side futex 
locking implementations.

However, I agree that a FUTEX_SET_WAIT_ADAPTIVE sort of call might be 
very powerful. However, that might be just academic until I can show an 
improvement with adaptive futexes.

>> The per thread pinned page would be unconditional, right ?
> 
> Only if the process would be using these adaptive mutexes.  It could be 
> conditional.

What about the concern of this TLS approach only working for process 
private locks? I would very much like to make this work for both shared 
and private locks.

Thanks,

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-08  3:41                           ` Darren Hart
@ 2010-04-08  4:29                             ` drepper
  0 siblings, 0 replies; 62+ messages in thread
From: drepper @ 2010-04-08  4:29 UTC (permalink / raw)
  To: Darren Hart
  Cc: Thomas Gleixner, Alan Cox, Peter Zijlstra, Avi Kivity,
	linux-kernel, Ingo Molnar, Eric Dumazet, Peter W. Morreale,
	Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright

[-- Attachment #1: Type: text/plain, Size: 1244 bytes --]

On Wed, Apr 7, 2010 at 20:41, Darren Hart <dvhltc@us.ibm.com> wrote:
> For general futexes sure, but not for robust or PI mutexes. Having adaptive
> futexes be based on the TID|WAITERS_FLAG policy certainly isn't breaking new
> ground, and is consistent with the other kernel-side futex locking
> implementations.

PI mutexes are really unimportant in the big world.  I know your employer cares but overall it's a minute fraction.   The focus should be primarily on the normal futexes.

BTW, you want to stuff a flag in the futex word?  This doesn't work in general.  For a mutex we need three distinct value.  For PI futexes it's 0, TID and -TID.  If we have 31 bit TID values there isn't enough room for another bit.

> What about the concern of this TLS approach only working for process private
> locks? I would very much like to make this work for both shared and private
> locks.

Again, hardly a general concern.  It's a minute fraction of mutexes which is shared.

It should be clear that the kernel approach and the userlevel approach are complimentary and could even be used.  If the userlevel variant proves significantly faster (and I assume it will) then the kernel variant could be used for shared mutexes etc.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 272 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 6/6] futex: Add aggressive adaptive spinning argument to FUTEX_LOCK
  2010-04-05 20:23 ` [PATCH 6/6] futex: Add aggressive adaptive spinning argument to FUTEX_LOCK Darren Hart
@ 2010-04-08  5:58   ` Darren Hart
  0 siblings, 0 replies; 62+ messages in thread
From: Darren Hart @ 2010-04-08  5:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Ulrich Drepper

To eliminate syscall overhead from the equation, I modified the testcase 
to allow for forcing the syscall on lock(). Doing so cut the 
non-adaptive scores by more than half. The adaptive scores dropped 
accordingly. The relative difference between normal and adaptive 
remained in tact (with my adaptive implementation lagging by 10x). So 
while the syscall overhead does impact the scores, it is not the source 
of the performance issue with the adaptive futex implementation I posted.

The following bits were being used to test for spinners and attempt to 
only allow one spinner. Obviously it failed miserably at that. I found 
up to 8 spinners running at a time with an instrumented kernel.

> @@ -2497,6 +2502,14 @@ static int futex_lock(u32 __user *uaddr, int flags, int detect, ktime_t *time)
>  retry:
>  #ifdef CONFIG_SMP
>  	if (flags & FLAGS_ADAPTIVE) {
> +		if (!aas) {
> +			ret = get_user(uval, uaddr);
> +			if (ret)
> +				goto out;
> +			if (uval & FUTEX_WAITERS)
> +				goto skip_adaptive;
> +		}

Trouble is at this point is there are no more bits in the word to be 
able to have a FUTEX_SPINNER bit. The futex word is the only per-futex 
storage we have, the futex_q is per task.

If we overload the FUTEX_WAITERS bit it will force more futex_wake() 
calls on the unlock() path. It also will effectively disable spinning 
under contention as there are bound to be FUTEX_WAITERS in that case.

Another option I dislike is to forget about robust futexes in 
conjunction with adaptive futexes and overload the FUTEX_OWNER_DIED bit. 
Ulrich mentioned in another mail that "If we have 31 bit TID values 
there isn't enough room for another bit." Since we have two flag bits 
now, I figured TID values were 30 bits. Is there an option to run with 
31 bits or something?

Assuming we all agree that these options are "bad", that leaves us with 
looking for somewhere else to store the information we need, which in 
turn brings us back around to what Avi, Alan, and Ulrich were discussing 
regarding non swappable TLS data and a pointer in the futex value.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-04-08  3:25         ` Darren Hart
@ 2010-04-08 23:10           ` Peter W. Morreale
  2010-04-09  5:41             ` Darren Hart
  0 siblings, 1 reply; 62+ messages in thread
From: Peter W. Morreale @ 2010-04-08 23:10 UTC (permalink / raw)
  To: Darren Hart
  Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Peter Zijlstra

On Wed, 2010-04-07 at 20:25 -0700, Darren Hart wrote:
> Thomas Gleixner wrote:
> > On Wed, 7 Apr 2010, Darren Hart wrote:
> >> Thomas Gleixner wrote:
> >>> On Mon, 5 Apr 2010, Darren Hart wrote:
> >>> Hmm. The order is weird. Why don't you do that simpler ?
> >>>
> >>> Get the uval, the tid and the thread_info pointer outside of the
> >>> loop. Also task_pid_vnr(current) just needs a one time lookup.
> >> Eeek. Having the owner in the loop is a good way to negate the benefits
> >> of adaptive spinning by spinning forever (unlikely, but it could
> >> certainly spin across multiple owners). Nice catch.
> >>
> >> As for the uval.... I'm not sure what you mean. You get curval below
> >> inside the loop, and there is no "uval" in the my version of the code.
> > 
> > Well, you need a first time lookup of owner and ownertid for which you
> > need the user space value (uval),
> > but thinking more about it it's not
> > even necessary. Just initialize ownertid to 0 so it will drop into the
> > lookup code when we did not acquire the futex in the cmpxchg.
> 
> No need for ownertid at all really. The cmpxchg always tries to go from 
> 0 to curtid. I've pushed the futex_owner() call outside the loop for a 
> one time lookup.
> 
> > 
> >> As for the order, I had put the initial spin prior to the cmpxchg to
> >> avoid doing too many cmpxchg's in a row as they are rather expensive.
> >> However, since this is (now) the first opportunity to do try and acquire
> >> the lock atomically after entering the futex syscall, I think you're
> >> right, it should be the first thing in the loop.
> >>
> >>> change the loop to do:
> >>>
> >>>        for (;;) {
> >>>        	   curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
> >>>   	   if (!curval)
> >>> 	      return 1;
> >> Single return point makes instrumentation so much easier. Unless folks
> >> _really_ object, I'll leave it as is until we're closer to merging.
> > 
> > I don't care either way. That was just example code.
> > 
> >>> 	   if ((curval & FUTEX_TID_MASK) != ownertid) {
> >>> 	      ownertid = curval & FUTEX_TID_MASK;
> >>> 	      owner = update_owner(ownertid);
> >>> 	   }
> >>
> >> Hrm... at this point the owner has changed... so we should break and go
> >> to sleep, not update the owner and start spinning again. The
> >> futex_spin_on_owner() will detect this and abort, so I'm not seeing the
> >> purpose of the above if() block.
> > 
> > Why ? If the owner has changed and the new owner is running on another
> > cpu then why not spin further ?
> 
> That's an interesting question, and I'm not sure what the right answer 
> is. The current approach of the adaptive spinning in the kernel is to 
> spin until the owner changes or deschedules, then stop and block. The 
> idea is that if you didn't get the lock before the owner changed, you 
> aren't going to get it in a very short period of time (you have at least 
> an entire critical section to wait through plus whatever time you've 
> already spent spinning). However, blocking just so another task can spin 
> doesn't really make sense either, and makes the lock less fair than it 
> could otherwise be.

Not only less fair, but potentially could cause starvation, no?  Perhaps
you could see this if you changed your model to allow all contended
tasks to spin instead of just one.  

If a spinning task blocks because of an owner change, and a new task
enters and starts spinning directly after the owner change, at what
point does the original task get woken up?  Its likely that the new
spinner will get the resource next, no?  Rinse/repeat with another task
and the original spinner is starved.  

(Or am I missing something?  My understanding was that unfairness was
built-in to this algo... If so, then the above is a possibility, right?)

Best,
-PWM

> 
> My goal in starting this is to provide a more intelligent mechanism than 
> sched_yield() for userspace to use to determine when to spin and when to 
> sleep. The current implementation allows for spinning up until the owner 
> changes, deschedules, or the timeslice expires. I believe these are much 
> better than spinning for some fixed number of cycles and then yield for 
> some unpredictable amount of time until CFS decides to schedule you back in.
> 
> Still, the criteria for breaking the spin are something that needs more 
> eyes, and more numbers before I can be confident in any approach.
> 
> > 
> >>>> +		hrtimer_init_sleeper(to, current);
> >>>> +		hrtimer_set_expires(&to->timer, *time);
> >>>> +	}
> >>>   Why setup all this _before_ trying the adaptive spin ?
> >>
> >> I placed the retry: label above the adaptive spin loop. This way if we wake a
> >> task and the lock is "stolen" it doesn't just go right back to sleep. This
> >> should aid in fairness and also performance in less contended cases. I didn't
> >> think it was worth a "if (first_time_through && time)" sort of block to be
> >> able to setup the timer after the spin loop.
> > 
> > Hmm, ok.
> > 
> >>> Do we really need all this code ? A simple owner->on_cpu (owner needs
> >>> to be the task_struct then) would be sufficient to figure that out,
> >>> wouldn't it?
> >> As Peter pointed out in IRC, p->oncpu isn't generic. I'll go trolling through
> >> the mutex_spin_on_owner() discussions to see if I can determine why that's the
> >> case.
> > 
> > AFAICT p->oncpu is the correct thing to use when CONFIG_SMP=y. All it
> > needs is a simple accessor function and you can keep all the futex
> > cruft in futex.c where it belongs.
> 
> Noted.
> 
> Thanks,



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-04-08 23:10           ` Peter W. Morreale
@ 2010-04-09  5:41             ` Darren Hart
  2010-04-09 13:13               ` Peter W. Morreale
  0 siblings, 1 reply; 62+ messages in thread
From: Darren Hart @ 2010-04-09  5:41 UTC (permalink / raw)
  To: Peter W. Morreale
  Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Peter Zijlstra

Peter W. Morreale wrote:
> On Wed, 2010-04-07 at 20:25 -0700, Darren Hart wrote:
>> Thomas Gleixner wrote:
>>> On Wed, 7 Apr 2010, Darren Hart wrote:
>>>> Thomas Gleixner wrote:

>>>>> 	   if ((curval & FUTEX_TID_MASK) != ownertid) {
>>>>> 	      ownertid = curval & FUTEX_TID_MASK;
>>>>> 	      owner = update_owner(ownertid);
>>>>> 	   }
>>>> Hrm... at this point the owner has changed... so we should break and go
>>>> to sleep, not update the owner and start spinning again. The
>>>> futex_spin_on_owner() will detect this and abort, so I'm not seeing the
>>>> purpose of the above if() block.
>>> Why ? If the owner has changed and the new owner is running on another
>>> cpu then why not spin further ?
>> That's an interesting question, and I'm not sure what the right answer 
>> is. The current approach of the adaptive spinning in the kernel is to 
>> spin until the owner changes or deschedules, then stop and block. The 
>> idea is that if you didn't get the lock before the owner changed, you 
>> aren't going to get it in a very short period of time (you have at least 
>> an entire critical section to wait through plus whatever time you've 
>> already spent spinning). However, blocking just so another task can spin 
>> doesn't really make sense either, and makes the lock less fair than it 
>> could otherwise be.
> 
> Not only less fair, but potentially could cause starvation, no?  Perhaps
> you could see this if you changed your model to allow all contended
> tasks to spin instead of just one.  

Agreed, and V5 (just posted) does just that.

> 
> If a spinning task blocks because of an owner change, and a new task
> enters and starts spinning directly after the owner change, at what
> point does the original task get woken up?

At the time of unlock the owner will have to call FUTEX_WAKE. This task 
will wake and attempt to acquire the lock - it will lose races with 
aclready running contenders. Lock stealing, adaptive spinning, etc are 
all going to lead to less fair locks in exchange for throughput.

>  Its likely that the new
> spinner will get the resource next, no?  Rinse/repeat with another task
> and the original spinner is starved.  
> 
> (Or am I missing something?  My understanding was that unfairness was
> built-in to this algo... If so, then the above is a possibility, right?)

Yes it is. These locks are typically used in situations where it is more 
important that some work gets completed than _which_ work gets completed.

Thanks,

--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning
  2010-04-08  3:33                         ` Darren Hart
@ 2010-04-09  5:52                           ` john cooper
  0 siblings, 0 replies; 62+ messages in thread
From: john cooper @ 2010-04-09  5:52 UTC (permalink / raw)
  To: Darren Hart
  Cc: john cooper, Avi Kivity, Thomas Gleixner, Alan Cox,
	Peter Zijlstra, linux-kernel, Ingo Molnar, Eric Dumazet,
	Peter W. Morreale, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, Chris Wright, john.cooper

Darren Hart wrote:
> john cooper wrote:
>> But here we're otherwise completely
>> open to indiscriminate scheduling preemption even though
>> we may be holding a userland lock.
> 
> That's true with any userland lock.

Agreed.  However from your earlier mail it seemed
addressing this scenario was within the scope of
consideration.

There are two ways to deal with this condition, either
reactive in the sense we do so after the lock holder
has been preempted and subsequently find we're spinning
in sibling thread context attempting to acquire the
lock.  Or proactively where we provide a time bounded
deferral of lock holder preemption with the assumption
the lock hold path overhead has negligible effect upon
deferring a potentially coincident scheduling operation.

It is fairly straightforward to demonstrate the impact
to performance with a focused micro benchmark, less so
for a more "typical" application with the effect being
diluted among other app activity.

The two approaches are complimentary with differing
system wide tradeoffs.  Both require some insight into
the scheduling disposition of the lock holder, the
preemption deferral mechanism more so.  If a scheme to
expose scheduler state transitions to (or cooperate
with) userland locking primitives is being considered,
it seems opportune to consider support as well for
this scenario.

-john

-- 
john.cooper@redhat.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning
  2010-04-09  5:41             ` Darren Hart
@ 2010-04-09 13:13               ` Peter W. Morreale
  0 siblings, 0 replies; 62+ messages in thread
From: Peter W. Morreale @ 2010-04-09 13:13 UTC (permalink / raw)
  To: Darren Hart
  Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Eric Dumazet, Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright,
	Avi Kivity, Peter Zijlstra

On Thu, 2010-04-08 at 22:41 -0700, Darren Hart wrote:
> Peter W. Morreale wrote:
> > On Wed, 2010-04-07 at 20:25 -0700, Darren Hart wrote:
> >> Thomas Gleixner wrote:
> >>> On Wed, 7 Apr 2010, Darren Hart wrote:
> >>>> Thomas Gleixner wrote:
> 
> >>>>> 	   if ((curval & FUTEX_TID_MASK) != ownertid) {
> >>>>> 	      ownertid = curval & FUTEX_TID_MASK;
> >>>>> 	      owner = update_owner(ownertid);
> >>>>> 	   }
> >>>> Hrm... at this point the owner has changed... so we should break and go
> >>>> to sleep, not update the owner and start spinning again. The
> >>>> futex_spin_on_owner() will detect this and abort, so I'm not seeing the
> >>>> purpose of the above if() block.
> >>> Why ? If the owner has changed and the new owner is running on another
> >>> cpu then why not spin further ?
> >> That's an interesting question, and I'm not sure what the right answer 
> >> is. The current approach of the adaptive spinning in the kernel is to 
> >> spin until the owner changes or deschedules, then stop and block. The 
> >> idea is that if you didn't get the lock before the owner changed, you 
> >> aren't going to get it in a very short period of time (you have at least 
> >> an entire critical section to wait through plus whatever time you've 
> >> already spent spinning). However, blocking just so another task can spin 
> >> doesn't really make sense either, and makes the lock less fair than it 
> >> could otherwise be.
> > 
> > Not only less fair, but potentially could cause starvation, no?  Perhaps
> > you could see this if you changed your model to allow all contended
> > tasks to spin instead of just one.  
> 
> Agreed, and V5 (just posted) does just that.
> 
> > 
> > If a spinning task blocks because of an owner change, and a new task
> > enters and starts spinning directly after the owner change, at what
> > point does the original task get woken up?
> 
> At the time of unlock the owner will have to call FUTEX_WAKE. This task 
> will wake and attempt to acquire the lock - it will lose races with 
> aclready running contenders. Lock stealing, adaptive spinning, etc are 
> all going to lead to less fair locks in exchange for throughput.

nod.  My only point was that if only one can spin, and sleeps when the
owner is not on CPU, then you virtually guarantee worst case performance
under certain conditions (ie: multiple threads coming in at various
times)

If all spin, and they sleep only on owner block, then the 'unfairness'
is potentially more evenly distributed via hardware (and stealing) and
when the owner is blocked. 


Best,
-PWM

> 
> >  Its likely that the new
> > spinner will get the resource next, no?  Rinse/repeat with another task
> > and the original spinner is starved.  
> > 
> > (Or am I missing something?  My understanding was that unfairness was
> > built-in to this algo... If so, then the above is a possibility, right?)
> 
> Yes it is. These locks are typically used in situations where it is more 
> important that some work gets completed than _which_ work gets completed.
> 
> Thanks,
> 
> --
> Darren Hart
> IBM Linux Technology Center
> Real-Time Linux Team



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive  spinning
  2010-04-06 17:34                   ` Ulrich Drepper
@ 2010-04-10 23:35                     ` Alan Cox
  2010-04-10 23:53                       ` Ulrich Drepper
  0 siblings, 1 reply; 62+ messages in thread
From: Alan Cox @ 2010-04-10 23:35 UTC (permalink / raw)
  To: Ulrich Drepper
  Cc: Darren Hart, Peter Zijlstra, Avi Kivity, linux-kernel,
	Thomas Gleixner, Ingo Molnar, Eric Dumazet, Peter W. Morreale,
	Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright

> The non-swappable part will be tricky.  One doesn't know how many
> threads will be created in a process.  This mechanism shouldn't put an
> arbitrary limit in place.  So where to allocate the memory?  Perhaps
> it's better to implicitly mark the memory page pointed to by the new
> syscall as non-swappable?  This could mean one page per thread...

You only need one page per 4096 threads or so if you make it create the
page on the first request, tied to say the signals and the like in
threaded tasks, and you then allocate 'slots' in the page for future
calls until you've got 4096 live.

ie you'd see something like

	addr = set_tid_notifier_addr();

	[1st call
	 map page at x to x + 4095, probably R/O]
	[returns x]

next thread
	addr = set_tid_notifier_addr()

	[returns x + 1]

Alan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive  spinning
  2010-04-10 23:35                     ` Alan Cox
@ 2010-04-10 23:53                       ` Ulrich Drepper
  0 siblings, 0 replies; 62+ messages in thread
From: Ulrich Drepper @ 2010-04-10 23:53 UTC (permalink / raw)
  To: Alan Cox
  Cc: Darren Hart, Peter Zijlstra, Avi Kivity, linux-kernel,
	Thomas Gleixner, Ingo Molnar, Eric Dumazet, Peter W. Morreale,
	Rik van Riel, Steven Rostedt, Gregory Haskins,
	Sven-Thorsten Dietrich, Chris Mason, John Cooper, Chris Wright

On Sat, Apr 10, 2010 at 16:35, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> You only need one page per 4096 threads

Very expensive.  Each cache line would be fought over by 64 threads.
Constant RFOs make context switches significantly slower.

At most 4096/64 = 64 threads should share one page.  One page would
still cover almost all processes.

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2010-04-10 23:53 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-05 20:23 [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
2010-04-05 20:23 ` [PATCH 1/6] futex: replace fshared and clockrt with combined flags Darren Hart
2010-04-05 20:23 ` [PATCH 2/6] futex: add futex_q static initializer Darren Hart
2010-04-05 20:23 ` [PATCH 3/6] futex: refactor futex_lock_pi_atomic Darren Hart
2010-04-05 20:23 ` [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning Darren Hart
2010-04-06 16:55   ` Thomas Gleixner
2010-04-07 17:26     ` Darren Hart
2010-04-07 19:59       ` Thomas Gleixner
2010-04-08  3:25         ` Darren Hart
2010-04-08 23:10           ` Peter W. Morreale
2010-04-09  5:41             ` Darren Hart
2010-04-09 13:13               ` Peter W. Morreale
2010-04-05 20:23 ` [PATCH 5/6] futex: handle timeout inside adaptive lock spin Darren Hart
2010-04-06  8:27   ` Thomas Gleixner
2010-04-07 17:31     ` Darren Hart
2010-04-07 18:44       ` Gregory Haskins
2010-04-07 23:15         ` Darren Hart
2010-04-05 20:23 ` [PATCH 6/6] futex: Add aggressive adaptive spinning argument to FUTEX_LOCK Darren Hart
2010-04-08  5:58   ` Darren Hart
2010-04-05 20:48 ` [PATCH V2^W V4 0/6][RFC] futex: FUTEX_LOCK with optional adaptive spinning Darren Hart
2010-04-05 21:15 ` [PATCH V2 " Avi Kivity
2010-04-05 21:54   ` Darren Hart
2010-04-05 22:21     ` Avi Kivity
2010-04-05 22:59       ` Darren Hart
2010-04-06 13:28         ` Avi Kivity
2010-04-06 13:35           ` Peter Zijlstra
2010-04-06 13:41             ` Avi Kivity
2010-04-06 14:09               ` Peter Zijlstra
2010-04-06 16:10                 ` Avi Kivity
2010-04-06 16:53                   ` Alan Cox
2010-04-06 13:51             ` Alan Cox
2010-04-06 15:28               ` Darren Hart
2010-04-06 16:06                 ` Avi Kivity
2010-04-06 16:14                   ` Thomas Gleixner
2010-04-06 16:20                     ` Avi Kivity
2010-04-07  6:18                       ` john cooper
2010-04-08  3:33                         ` Darren Hart
2010-04-09  5:52                           ` john cooper
2010-04-06 16:54                     ` Alan Cox
2010-04-06 18:15                       ` Thomas Gleixner
2010-04-06 16:44                 ` Alan Cox
2010-04-06 17:34                   ` Ulrich Drepper
2010-04-10 23:35                     ` Alan Cox
2010-04-10 23:53                       ` Ulrich Drepper
2010-04-06 19:31                   ` Thomas Gleixner
2010-04-06 20:02                     ` Ulrich Drepper
2010-04-06 23:16                       ` Thomas Gleixner
2010-04-06 23:36                         ` Darren Hart
2010-04-07  6:08                         ` drepper
2010-04-08  3:41                           ` Darren Hart
2010-04-08  4:29                             ` drepper
2010-04-07  5:33                     ` Avi Kivity
2010-04-06 21:22         ` Darren Hart
2010-04-05 23:15       ` Darren Hart
2010-04-05 23:29         ` Chris Wright
2010-04-06 13:30         ` Avi Kivity
2010-04-06  8:48   ` Peter Zijlstra
2010-04-06 14:47     ` Ulrich Drepper
2010-04-06 14:51       ` Peter Zijlstra
2010-04-06 15:33         ` Darren Hart
2010-04-06 15:37           ` Peter Zijlstra
2010-04-06 15:29 ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.