All of lore.kernel.org
 help / color / mirror / Atom feed
* Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-19 15:26 ` Srikar Dronamraju
  0 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-19 15:26 UTC (permalink / raw)
  To: davidlohr; +Cc: torvalds, tglx, peterz, mingo, LKML, linuxppc-dev, benh, paulus

Hi, 

When running specjbb on a power7 numa box, I am seeing java threads
getting stuck in futex 

#  ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
 14808 pts/0    root     java         - 0 futex_wait_queue_me
 14925 pts/0    root     java         - 0 futex_wait_queue_me
#

stack traces, I see
[ 1843.426591] Call Trace:
[ 1843.426595] [c0000017101d74d0] [0000000000000020] 0x20 (unreliable)
[ 1843.426601] [c0000017101d76a0] [c000000000014c50] .__switch_to+0x1e0/0x390
[ 1843.426607] [c0000017101d7750] [c0000000006ed314] .__schedule+0x364/0x8c0
[ 1843.426613] [c0000017101d79d0] [c000000000139c28] .futex_wait_queue_me+0xf8/0x1a0
[ 1843.426619] [c0000017101d7a60] [c00000000013afbc] .futex_wait+0x17c/0x2a0
[ 1843.426626] [c0000017101d7c10] [c00000000013cee4] .do_futex+0x254/0xd80
[ 1843.426631] [c0000017101d7d60] [c00000000013db2c] .SyS_futex+0x11c/0x1d0
[ 1843.426638] [c0000017101d7e30] [c000000000009efc] syscall_exit+0x0/0x7c
[ 1843.426643] java            S 00003fffa08b16a0     0 14812  14203 0x00000080
[ 1843.426650] Call Trace:
[ 1843.426653] [c00000170c6034d0] [c000001710b09cf8] 0xc000001710b09cf8 (unreliable)
[ 1843.426660] [c00000170c6036a0] [c000000000014c50] .__switch_to+0x1e0/0x390
[ 1843.426666] [c00000170c603750] [c0000000006ed314] .__schedule+0x364/0x8c0
[ 1843.426672] [c00000170c6039d0] [c000000000139c28] .futex_wait_queue_me+0xf8/0x1a0
[ 1843.426679] [c00000170c603a60] [c00000000013afbc] .futex_wait+0x17c/0x2a0
[ 1843.453383] [c00000170c603c10] [c00000000013cee4] .do_futex+0x254/0xd80
[ 1843.453389] [c00000170c603d60] [c00000000013db2c] .SyS_futex+0x11c/0x1d0
[ 1843.453395] [c00000170c603e30] [c000000000009efc] syscall_exit+0x0/0x7c
[ 1843.453400] java            S 00003fffa08b1a74     0 14813  14203 0x00000080
[ 1843.453407] Call Trace:

There are 332 tasks all stuck in futex_wait_queue_me().
I am able to reproduce this consistently.

Infact I can reproduce this if the java_constraint is either node, socket, system.
However I am not able to reproduce if java_constraint is set to core.

I ran git bisect between v3.12 and v3.14-rc6 and found that

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0c29f79ecea0b6fbcefc999e70f2843ae8306db

commit b0c29f79ecea0b6fbcefc999e70f2843ae8306db
Author: Davidlohr Bueso <davidlohr@hp.com>
Date:   Sun Jan 12 15:31:25 2014 -0800

futexes: Avoid taking the hb->lock if there's nothing to wake up

was the commit thats causing the threads to be stuck in futex.

I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that
reverting the commit solved the problem.

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-19 15:26 ` Srikar Dronamraju
  0 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-19 15:26 UTC (permalink / raw)
  To: davidlohr; +Cc: peterz, torvalds, LKML, paulus, tglx, linuxppc-dev, mingo

Hi, 

When running specjbb on a power7 numa box, I am seeing java threads
getting stuck in futex 

#  ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
 14808 pts/0    root     java         - 0 futex_wait_queue_me
 14925 pts/0    root     java         - 0 futex_wait_queue_me
#

stack traces, I see
[ 1843.426591] Call Trace:
[ 1843.426595] [c0000017101d74d0] [0000000000000020] 0x20 (unreliable)
[ 1843.426601] [c0000017101d76a0] [c000000000014c50] .__switch_to+0x1e0/0x390
[ 1843.426607] [c0000017101d7750] [c0000000006ed314] .__schedule+0x364/0x8c0
[ 1843.426613] [c0000017101d79d0] [c000000000139c28] .futex_wait_queue_me+0xf8/0x1a0
[ 1843.426619] [c0000017101d7a60] [c00000000013afbc] .futex_wait+0x17c/0x2a0
[ 1843.426626] [c0000017101d7c10] [c00000000013cee4] .do_futex+0x254/0xd80
[ 1843.426631] [c0000017101d7d60] [c00000000013db2c] .SyS_futex+0x11c/0x1d0
[ 1843.426638] [c0000017101d7e30] [c000000000009efc] syscall_exit+0x0/0x7c
[ 1843.426643] java            S 00003fffa08b16a0     0 14812  14203 0x00000080
[ 1843.426650] Call Trace:
[ 1843.426653] [c00000170c6034d0] [c000001710b09cf8] 0xc000001710b09cf8 (unreliable)
[ 1843.426660] [c00000170c6036a0] [c000000000014c50] .__switch_to+0x1e0/0x390
[ 1843.426666] [c00000170c603750] [c0000000006ed314] .__schedule+0x364/0x8c0
[ 1843.426672] [c00000170c6039d0] [c000000000139c28] .futex_wait_queue_me+0xf8/0x1a0
[ 1843.426679] [c00000170c603a60] [c00000000013afbc] .futex_wait+0x17c/0x2a0
[ 1843.453383] [c00000170c603c10] [c00000000013cee4] .do_futex+0x254/0xd80
[ 1843.453389] [c00000170c603d60] [c00000000013db2c] .SyS_futex+0x11c/0x1d0
[ 1843.453395] [c00000170c603e30] [c000000000009efc] syscall_exit+0x0/0x7c
[ 1843.453400] java            S 00003fffa08b1a74     0 14813  14203 0x00000080
[ 1843.453407] Call Trace:

There are 332 tasks all stuck in futex_wait_queue_me().
I am able to reproduce this consistently.

Infact I can reproduce this if the java_constraint is either node, socket, system.
However I am not able to reproduce if java_constraint is set to core.

I ran git bisect between v3.12 and v3.14-rc6 and found that

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0c29f79ecea0b6fbcefc999e70f2843ae8306db

commit b0c29f79ecea0b6fbcefc999e70f2843ae8306db
Author: Davidlohr Bueso <davidlohr@hp.com>
Date:   Sun Jan 12 15:31:25 2014 -0800

futexes: Avoid taking the hb->lock if there's nothing to wake up

was the commit thats causing the threads to be stuck in futex.

I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that
reverting the commit solved the problem.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-19 15:26 ` Srikar Dronamraju
@ 2014-03-19 15:47   ` Peter Zijlstra
  -1 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2014-03-19 15:47 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: davidlohr, torvalds, tglx, mingo, LKML, linuxppc-dev, benh, paulus

On Wed, Mar 19, 2014 at 08:56:19PM +0530, Srikar Dronamraju wrote:
> There are 332 tasks all stuck in futex_wait_queue_me().
> I am able to reproduce this consistently.
> 
> Infact I can reproduce this if the java_constraint is either node, socket, system.
> However I am not able to reproduce if java_constraint is set to core.

What's any of that mean?

> I ran git bisect between v3.12 and v3.14-rc6 and found that
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0c29f79ecea0b6fbcefc999e70f2843ae8306db
> 
> commit b0c29f79ecea0b6fbcefc999e70f2843ae8306db
> Author: Davidlohr Bueso <davidlohr@hp.com>
> Date:   Sun Jan 12 15:31:25 2014 -0800
> 
> futexes: Avoid taking the hb->lock if there's nothing to wake up
> 
> was the commit thats causing the threads to be stuck in futex.
> 
> I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that
> reverting the commit solved the problem.

Joy,.. let me look at that with ppc in mind.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-19 15:47   ` Peter Zijlstra
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2014-03-19 15:47 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: linuxppc-dev, LKML, davidlohr, paulus, tglx, torvalds, mingo

On Wed, Mar 19, 2014 at 08:56:19PM +0530, Srikar Dronamraju wrote:
> There are 332 tasks all stuck in futex_wait_queue_me().
> I am able to reproduce this consistently.
> 
> Infact I can reproduce this if the java_constraint is either node, socket, system.
> However I am not able to reproduce if java_constraint is set to core.

What's any of that mean?

> I ran git bisect between v3.12 and v3.14-rc6 and found that
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0c29f79ecea0b6fbcefc999e70f2843ae8306db
> 
> commit b0c29f79ecea0b6fbcefc999e70f2843ae8306db
> Author: Davidlohr Bueso <davidlohr@hp.com>
> Date:   Sun Jan 12 15:31:25 2014 -0800
> 
> futexes: Avoid taking the hb->lock if there's nothing to wake up
> 
> was the commit thats causing the threads to be stuck in futex.
> 
> I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that
> reverting the commit solved the problem.

Joy,.. let me look at that with ppc in mind.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-19 15:26 ` Srikar Dronamraju
@ 2014-03-19 16:04   ` Linus Torvalds
  -1 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-19 16:04 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras

On Wed, Mar 19, 2014 at 8:26 AM, Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>
> I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that
> reverting the commit solved the problem.

Ok. I'll give Peter and Davidlohr a few days to perhaps find something
obvious, but I guess we'll need to revert it from 3.14 and try again
later unless some fix comes up quickly..

Oh well.

             Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-19 16:04   ` Linus Torvalds
  0 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-19 16:04 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, LKML, Davidlohr Bueso, Paul Mackerras,
	Thomas Gleixner, ppc-dev, Ingo Molnar

On Wed, Mar 19, 2014 at 8:26 AM, Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>
> I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that
> reverting the commit solved the problem.

Ok. I'll give Peter and Davidlohr a few days to perhaps find something
obvious, but I guess we'll need to revert it from 3.14 and try again
later unless some fix comes up quickly..

Oh well.

             Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-19 15:47   ` Peter Zijlstra
@ 2014-03-19 16:09     ` Srikar Dronamraju
  -1 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-19 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: davidlohr, torvalds, tglx, mingo, LKML, linuxppc-dev, benh, paulus

> > 
> > Infact I can reproduce this if the java_constraint is either node, socket, system.
> > However I am not able to reproduce if java_constraint is set to core.
> 
> What's any of that mean?
> 

Using the constraint, one can specify how many jvm instances should
participate in the specjbb run.

For example on a 4 node box, I can say 2 jvms per constraint with
constraint set to node and specjbb will run with 8 instances of java.

I was running with 1 jvm per constraint. But when I set the constraint
to node/System, I keep seeing this problem. However if I set the
constraint to core (which means running more instances of java), the
problem is not seen. I kind of guess, the lesser the number of java
instances the easier it is to reproduce. 

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-19 16:09     ` Srikar Dronamraju
  0 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-19 16:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linuxppc-dev, LKML, davidlohr, paulus, tglx, torvalds, mingo

> > 
> > Infact I can reproduce this if the java_constraint is either node, socket, system.
> > However I am not able to reproduce if java_constraint is set to core.
> 
> What's any of that mean?
> 

Using the constraint, one can specify how many jvm instances should
participate in the specjbb run.

For example on a 4 node box, I can say 2 jvms per constraint with
constraint set to node and specjbb will run with 8 instances of java.

I was running with 1 jvm per constraint. But when I set the constraint
to node/System, I keep seeing this problem. However if I set the
constraint to core (which means running more instances of java), the
problem is not seen. I kind of guess, the lesser the number of java
instances the easier it is to reproduce. 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-19 15:47   ` Peter Zijlstra
@ 2014-03-19 17:08     ` Peter Zijlstra
  -1 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2014-03-19 17:08 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: davidlohr, torvalds, tglx, mingo, LKML, linuxppc-dev, benh, paulus

On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote:
> > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that
> > reverting the commit solved the problem.
> 
> Joy,.. let me look at that with ppc in mind.

OK; so while pretty much all the comments from that patch are utter
nonsense (what was I thinking), I cannot actually find a real bug.

But could you try the below which replaces a control dependency with a
full barrier. The control flow is plenty convoluted that I think the
control barrier isn't actually valid anymore and that might indeed
explain the fail.


--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -119,42 +119,32 @@
  * sys_futex(WAIT, futex, val);
  *   futex_wait(futex, val);
  *
- *   waiters++;
- *   mb(); (A) <-- paired with -.
- *                              |
- *   lock(hash_bucket(futex));  |
- *                              |
- *   uval = *futex;             |
- *                              |        *futex = newval;
- *                              |        sys_futex(WAKE, futex);
- *                              |          futex_wake(futex);
- *                              |
- *                              `------->  mb(); (B)
- *   if (uval == val)
+ *
+ *   lock(hash_bucket(futex)); (A)
+ *
+ *   uval = *futex;
+ *                                       *futex = newval;
+ *                                       sys_futex(WAKE, futex);
+ *                                         futex_wake(futex);
+ *
+ *   if (uval == val) (B)		   smp_mb(); (D)
  *     queue();
- *     unlock(hash_bucket(futex));
- *     schedule();                         if (waiters)
+ *     unlock(hash_bucket(futex)); (C)
+ *     schedule();                         if (spin_is_locked(&hb_lock) ||
+ *					       (smp_rmb(), !plist_empty))) (E)
  *                                           lock(hash_bucket(futex));
  *                                           wake_waiters(futex);
  *                                           unlock(hash_bucket(futex));
  *
- * Where (A) orders the waiters increment and the futex value read -- this
- * is guaranteed by the head counter in the hb spinlock; and where (B)
- * orders the write to futex and the waiters read -- this is done by the
- * barriers in get_futex_key_refs(), through either ihold or atomic_inc,
- * depending on the futex type.
- *
- * This yields the following case (where X:=waiters, Y:=futex):
- *
- *	X = Y = 0
- *
- *	w[X]=1		w[Y]=1
- *	MB		MB
- *	r[Y]=y		r[X]=x
- *
- * Which guarantees that x==0 && y==0 is impossible; which translates back into
- * the guarantee that we cannot both miss the futex variable change and the
- * enqueue.
+ *
+ * Because of the acquire (A) and release (C) the futex value load and the 
+ * plist_add are guaranteed to be inside the locked region. Furthermore, the
+ * control dependency (B) ensures the futex load happens before the plist_add().
+ *
+ * On the wakeup side, the full barrier (D) separates the futex value write
+ * from the hb_lock load, and matches with the control dependency. The rmb (E)
+ * separates the spin_is_locked() read and the plist_head_empty() read, such
+ * that ..., matches with the release barrier (C).
  */
 
 #ifndef CONFIG_HAVE_FUTEX_CMPXCHG
@@ -250,7 +240,7 @@ static inline void futex_get_mm(union fu
 	/*
 	 * Ensure futex_get_mm() implies a full barrier such that
 	 * get_futex_key() implies a full barrier. This is relied upon
-	 * as full barrier (B), see the ordering comment above.
+	 * as full barrier (D), see the ordering comment above.
 	 */
 	smp_mb__after_atomic();
 }
@@ -308,10 +298,10 @@ static void get_futex_key_refs(union fut
 
 	switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
 	case FUT_OFF_INODE:
-		ihold(key->shared.inode); /* implies MB (B) */
+		ihold(key->shared.inode); /* implies MB (D) */
 		break;
 	case FUT_OFF_MMSHARED:
-		futex_get_mm(key); /* implies MB (B) */
+		futex_get_mm(key); /* implies MB (D) */
 		break;
 	}
 }
@@ -385,7 +375,7 @@ get_futex_key(u32 __user *uaddr, int fsh
 	if (!fshared) {
 		key->private.mm = mm;
 		key->private.address = address;
-		get_futex_key_refs(key);  /* implies MB (B) */
+		get_futex_key_refs(key);  /* implies MB (D) */
 		return 0;
 	}
 
@@ -492,7 +482,7 @@ get_futex_key(u32 __user *uaddr, int fsh
 		key->shared.pgoff = basepage_index(page);
 	}
 
-	get_futex_key_refs(key); /* implies MB (B) */
+	get_futex_key_refs(key); /* implies MB (D) */
 
 out:
 	unlock_page(page_head);
@@ -1604,7 +1594,7 @@ static inline struct futex_hash_bucket *
 	hb = hash_futex(&q->key);
 	q->lock_ptr = &hb->lock;
 
-	spin_lock(&hb->lock); /* implies MB (A) */
+	spin_lock(&hb->lock);
 	return hb;
 }
 
@@ -1995,6 +1985,8 @@ static int futex_wait_setup(u32 __user *
 		goto retry;
 	}
 
+	smp_mb();
+
 	if (uval != val) {
 		queue_unlock(*hb);
 		ret = -EWOULDBLOCK;

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-19 17:08     ` Peter Zijlstra
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2014-03-19 17:08 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: linuxppc-dev, LKML, davidlohr, paulus, tglx, torvalds, mingo

On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote:
> > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that
> > reverting the commit solved the problem.
> 
> Joy,.. let me look at that with ppc in mind.

OK; so while pretty much all the comments from that patch are utter
nonsense (what was I thinking), I cannot actually find a real bug.

But could you try the below which replaces a control dependency with a
full barrier. The control flow is plenty convoluted that I think the
control barrier isn't actually valid anymore and that might indeed
explain the fail.


--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -119,42 +119,32 @@
  * sys_futex(WAIT, futex, val);
  *   futex_wait(futex, val);
  *
- *   waiters++;
- *   mb(); (A) <-- paired with -.
- *                              |
- *   lock(hash_bucket(futex));  |
- *                              |
- *   uval = *futex;             |
- *                              |        *futex = newval;
- *                              |        sys_futex(WAKE, futex);
- *                              |          futex_wake(futex);
- *                              |
- *                              `------->  mb(); (B)
- *   if (uval == val)
+ *
+ *   lock(hash_bucket(futex)); (A)
+ *
+ *   uval = *futex;
+ *                                       *futex = newval;
+ *                                       sys_futex(WAKE, futex);
+ *                                         futex_wake(futex);
+ *
+ *   if (uval == val) (B)		   smp_mb(); (D)
  *     queue();
- *     unlock(hash_bucket(futex));
- *     schedule();                         if (waiters)
+ *     unlock(hash_bucket(futex)); (C)
+ *     schedule();                         if (spin_is_locked(&hb_lock) ||
+ *					       (smp_rmb(), !plist_empty))) (E)
  *                                           lock(hash_bucket(futex));
  *                                           wake_waiters(futex);
  *                                           unlock(hash_bucket(futex));
  *
- * Where (A) orders the waiters increment and the futex value read -- this
- * is guaranteed by the head counter in the hb spinlock; and where (B)
- * orders the write to futex and the waiters read -- this is done by the
- * barriers in get_futex_key_refs(), through either ihold or atomic_inc,
- * depending on the futex type.
- *
- * This yields the following case (where X:=waiters, Y:=futex):
- *
- *	X = Y = 0
- *
- *	w[X]=1		w[Y]=1
- *	MB		MB
- *	r[Y]=y		r[X]=x
- *
- * Which guarantees that x==0 && y==0 is impossible; which translates back into
- * the guarantee that we cannot both miss the futex variable change and the
- * enqueue.
+ *
+ * Because of the acquire (A) and release (C) the futex value load and the 
+ * plist_add are guaranteed to be inside the locked region. Furthermore, the
+ * control dependency (B) ensures the futex load happens before the plist_add().
+ *
+ * On the wakeup side, the full barrier (D) separates the futex value write
+ * from the hb_lock load, and matches with the control dependency. The rmb (E)
+ * separates the spin_is_locked() read and the plist_head_empty() read, such
+ * that ..., matches with the release barrier (C).
  */
 
 #ifndef CONFIG_HAVE_FUTEX_CMPXCHG
@@ -250,7 +240,7 @@ static inline void futex_get_mm(union fu
 	/*
 	 * Ensure futex_get_mm() implies a full barrier such that
 	 * get_futex_key() implies a full barrier. This is relied upon
-	 * as full barrier (B), see the ordering comment above.
+	 * as full barrier (D), see the ordering comment above.
 	 */
 	smp_mb__after_atomic();
 }
@@ -308,10 +298,10 @@ static void get_futex_key_refs(union fut
 
 	switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
 	case FUT_OFF_INODE:
-		ihold(key->shared.inode); /* implies MB (B) */
+		ihold(key->shared.inode); /* implies MB (D) */
 		break;
 	case FUT_OFF_MMSHARED:
-		futex_get_mm(key); /* implies MB (B) */
+		futex_get_mm(key); /* implies MB (D) */
 		break;
 	}
 }
@@ -385,7 +375,7 @@ get_futex_key(u32 __user *uaddr, int fsh
 	if (!fshared) {
 		key->private.mm = mm;
 		key->private.address = address;
-		get_futex_key_refs(key);  /* implies MB (B) */
+		get_futex_key_refs(key);  /* implies MB (D) */
 		return 0;
 	}
 
@@ -492,7 +482,7 @@ get_futex_key(u32 __user *uaddr, int fsh
 		key->shared.pgoff = basepage_index(page);
 	}
 
-	get_futex_key_refs(key); /* implies MB (B) */
+	get_futex_key_refs(key); /* implies MB (D) */
 
 out:
 	unlock_page(page_head);
@@ -1604,7 +1594,7 @@ static inline struct futex_hash_bucket *
 	hb = hash_futex(&q->key);
 	q->lock_ptr = &hb->lock;
 
-	spin_lock(&hb->lock); /* implies MB (A) */
+	spin_lock(&hb->lock);
 	return hb;
 }
 
@@ -1995,6 +1985,8 @@ static int futex_wait_setup(u32 __user *
 		goto retry;
 	}
 
+	smp_mb();
+
 	if (uval != val) {
 		queue_unlock(*hb);
 		ret = -EWOULDBLOCK;

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-19 17:08     ` Peter Zijlstra
@ 2014-03-19 18:06       ` Davidlohr Bueso
  -1 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-19 18:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, torvalds, tglx, mingo, LKML, linuxppc-dev,
	benh, paulus, Paul E. McKenney

On Wed, 2014-03-19 at 18:08 +0100, Peter Zijlstra wrote:
> On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote:
> > > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that
> > > reverting the commit solved the problem.
> > 
> > Joy,.. let me look at that with ppc in mind.

errr... just sat down to check email this morning. CC'ing Paul as for
any subtle barrier issues.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-19 18:06       ` Davidlohr Bueso
  0 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-19 18:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srikar Dronamraju, torvalds, LKML, paulus, tglx,
	Paul E. McKenney, linuxppc-dev, mingo

On Wed, 2014-03-19 at 18:08 +0100, Peter Zijlstra wrote:
> On Wed, Mar 19, 2014 at 04:47:05PM +0100, Peter Zijlstra wrote:
> > > I reverted b0c29f79ecea0b6fbcefc999e70f2843ae8306db on top of v3.14-rc6 and confirmed that
> > > reverting the commit solved the problem.
> > 
> > Joy,.. let me look at that with ppc in mind.

errr... just sat down to check email this morning. CC'ing Paul as for
any subtle barrier issues.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-19 17:08     ` Peter Zijlstra
@ 2014-03-20  5:33       ` Srikar Dronamraju
  -1 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-20  5:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: davidlohr, torvalds, tglx, mingo, LKML, linuxppc-dev, benh,
	paulus, Paul McKenney

> > Joy,.. let me look at that with ppc in mind.
> 
> OK; so while pretty much all the comments from that patch are utter
> nonsense (what was I thinking), I cannot actually find a real bug.
> 
> But could you try the below which replaces a control dependency with a
> full barrier. The control flow is plenty convoluted that I think the
> control barrier isn't actually valid anymore and that might indeed
> explain the fail.
> 

Unfortunately the patch didnt help. Still seeing tasks stuck

# ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
14680 pts/0    root     java         - 0 futex_wait_queue_me
14797 pts/0    root     java         - 0 futex_wait_queue_me
# :> /var/log/messages
# echo t > /proc/sysrq-trigger 
# grep futex_wait_queue_me /var/log/messages | wc -l 
334
#

[ 6904.211478] Call Trace:
[ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable)
[ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330
[ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0
[ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0
[ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0
[ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80
[ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0
[ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c
[ 6904.238879] java            S 00003fff825f6044     0 14682  14076 0x00000080

Is there any other information that I provide that can help?

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20  5:33       ` Srikar Dronamraju
  0 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-20  5:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linuxppc-dev, LKML, davidlohr, paulus, tglx, Paul McKenney,
	torvalds, mingo

> > Joy,.. let me look at that with ppc in mind.
> 
> OK; so while pretty much all the comments from that patch are utter
> nonsense (what was I thinking), I cannot actually find a real bug.
> 
> But could you try the below which replaces a control dependency with a
> full barrier. The control flow is plenty convoluted that I think the
> control barrier isn't actually valid anymore and that might indeed
> explain the fail.
> 

Unfortunately the patch didnt help. Still seeing tasks stuck

# ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
14680 pts/0    root     java         - 0 futex_wait_queue_me
14797 pts/0    root     java         - 0 futex_wait_queue_me
# :> /var/log/messages
# echo t > /proc/sysrq-trigger 
# grep futex_wait_queue_me /var/log/messages | wc -l 
334
#

[ 6904.211478] Call Trace:
[ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable)
[ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330
[ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0
[ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0
[ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0
[ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80
[ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0
[ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c
[ 6904.238879] java            S 00003fff825f6044     0 14682  14076 0x00000080

Is there any other information that I provide that can help?

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20  5:33       ` Srikar Dronamraju
@ 2014-03-20  5:56         ` Davidlohr Bueso
  -1 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20  5:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, torvalds, tglx, mingo, LKML, linuxppc-dev, benh,
	paulus, Paul McKenney

On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote:
> > > Joy,.. let me look at that with ppc in mind.
> > 
> > OK; so while pretty much all the comments from that patch are utter
> > nonsense (what was I thinking), I cannot actually find a real bug.
> > 
> > But could you try the below which replaces a control dependency with a
> > full barrier. The control flow is plenty convoluted that I think the
> > control barrier isn't actually valid anymore and that might indeed
> > explain the fail.
> > 
> 
> Unfortunately the patch didnt help. Still seeing tasks stuck
> 
> # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
> 14680 pts/0    root     java         - 0 futex_wait_queue_me
> 14797 pts/0    root     java         - 0 futex_wait_queue_me
> # :> /var/log/messages
> # echo t > /proc/sysrq-trigger 
> # grep futex_wait_queue_me /var/log/messages | wc -l 
> 334
> #
> 
> [ 6904.211478] Call Trace:
> [ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable)
> [ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330
> [ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0
> [ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0
> [ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0
> [ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80
> [ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0
> [ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c
> [ 6904.238879] java            S 00003fff825f6044     0 14682  14076 0x00000080
> 
> Is there any other information that I provide that can help?

This problem suggests that we missed a wakeup for a task that was adding
itself to the queue in a wait path. And the only place that can happen
is with the hb spinlock check for any pending waiters. Just in case we
missed some assumption about checking the hash bucket spinlock as a way
of detecting any waiters (powerpc?), could you revert this commit and
try the original atomic operations variant:

https://lkml.org/lkml/2013/12/19/630


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20  5:56         ` Davidlohr Bueso
  0 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20  5:56 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, torvalds, LKML, paulus, tglx, Paul McKenney,
	linuxppc-dev, mingo

On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote:
> > > Joy,.. let me look at that with ppc in mind.
> > 
> > OK; so while pretty much all the comments from that patch are utter
> > nonsense (what was I thinking), I cannot actually find a real bug.
> > 
> > But could you try the below which replaces a control dependency with a
> > full barrier. The control flow is plenty convoluted that I think the
> > control barrier isn't actually valid anymore and that might indeed
> > explain the fail.
> > 
> 
> Unfortunately the patch didnt help. Still seeing tasks stuck
> 
> # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
> 14680 pts/0    root     java         - 0 futex_wait_queue_me
> 14797 pts/0    root     java         - 0 futex_wait_queue_me
> # :> /var/log/messages
> # echo t > /proc/sysrq-trigger 
> # grep futex_wait_queue_me /var/log/messages | wc -l 
> 334
> #
> 
> [ 6904.211478] Call Trace:
> [ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable)
> [ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330
> [ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0
> [ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0
> [ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0
> [ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80
> [ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0
> [ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c
> [ 6904.238879] java            S 00003fff825f6044     0 14682  14076 0x00000080
> 
> Is there any other information that I provide that can help?

This problem suggests that we missed a wakeup for a task that was adding
itself to the queue in a wait path. And the only place that can happen
is with the hb spinlock check for any pending waiters. Just in case we
missed some assumption about checking the hash bucket spinlock as a way
of detecting any waiters (powerpc?), could you revert this commit and
try the original atomic operations variant:

https://lkml.org/lkml/2013/12/19/630

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20  5:33       ` Srikar Dronamraju
@ 2014-03-20  7:23         ` Peter Zijlstra
  -1 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2014-03-20  7:23 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: davidlohr, torvalds, tglx, mingo, LKML, linuxppc-dev, benh,
	paulus, Paul McKenney

On Thu, Mar 20, 2014 at 11:03:50AM +0530, Srikar Dronamraju wrote:
> > > Joy,.. let me look at that with ppc in mind.
> > 
> > OK; so while pretty much all the comments from that patch are utter
> > nonsense (what was I thinking), I cannot actually find a real bug.
> > 
> > But could you try the below which replaces a control dependency with a
> > full barrier. The control flow is plenty convoluted that I think the
> > control barrier isn't actually valid anymore and that might indeed
> > explain the fail.
> > 
> 
> Unfortunately the patch didnt help. Still seeing tasks stuck

Aww bugger. I'll be traveling tomorrow and today is wasted getting
ready. So unless Davidlohr has anything we'll need to scrap this change.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20  7:23         ` Peter Zijlstra
  0 siblings, 0 replies; 54+ messages in thread
From: Peter Zijlstra @ 2014-03-20  7:23 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: linuxppc-dev, LKML, davidlohr, paulus, tglx, Paul McKenney,
	torvalds, mingo

On Thu, Mar 20, 2014 at 11:03:50AM +0530, Srikar Dronamraju wrote:
> > > Joy,.. let me look at that with ppc in mind.
> > 
> > OK; so while pretty much all the comments from that patch are utter
> > nonsense (what was I thinking), I cannot actually find a real bug.
> > 
> > But could you try the below which replaces a control dependency with a
> > full barrier. The control flow is plenty convoluted that I think the
> > control barrier isn't actually valid anymore and that might indeed
> > explain the fail.
> > 
> 
> Unfortunately the patch didnt help. Still seeing tasks stuck

Aww bugger. I'll be traveling tomorrow and today is wasted getting
ready. So unless Davidlohr has anything we'll need to scrap this change.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20  5:56         ` Davidlohr Bueso
@ 2014-03-20 10:08           ` Srikar Dronamraju
  -1 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-20 10:08 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, torvalds, LKML, paulus, tglx, Paul McKenney,
	linuxppc-dev, mingo

> This problem suggests that we missed a wakeup for a task that was adding
> itself to the queue in a wait path. And the only place that can happen
> is with the hb spinlock check for any pending waiters. Just in case we
> missed some assumption about checking the hash bucket spinlock as a way
> of detecting any waiters (powerpc?), could you revert this commit and
> try the original atomic operations variant:
> 
> https://lkml.org/lkml/2013/12/19/630

I think the above url and the commit id that I reverted i.e
git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0c29f79ecea0b6fbcefc999
are the same.

Or am I missing something? 

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 10:08           ` Srikar Dronamraju
  0 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-20 10:08 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, linuxppc-dev, LKML, paulus, tglx, Paul McKenney,
	torvalds, mingo

> This problem suggests that we missed a wakeup for a task that was adding
> itself to the queue in a wait path. And the only place that can happen
> is with the hb spinlock check for any pending waiters. Just in case we
> missed some assumption about checking the hash bucket spinlock as a way
> of detecting any waiters (powerpc?), could you revert this commit and
> try the original atomic operations variant:
> 
> https://lkml.org/lkml/2013/12/19/630

I think the above url and the commit id that I reverted i.e
git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0c29f79ecea0b6fbcefc999
are the same.

Or am I missing something? 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 10:08           ` Srikar Dronamraju
@ 2014-03-20 15:06             ` Davidlohr Bueso
  -1 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 15:06 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, torvalds, LKML, paulus, tglx, Paul McKenney,
	linuxppc-dev, mingo

On Thu, 2014-03-20 at 15:38 +0530, Srikar Dronamraju wrote:
> > This problem suggests that we missed a wakeup for a task that was adding
> > itself to the queue in a wait path. And the only place that can happen
> > is with the hb spinlock check for any pending waiters. Just in case we
> > missed some assumption about checking the hash bucket spinlock as a way
> > of detecting any waiters (powerpc?), could you revert this commit and
> > try the original atomic operations variant:
> > 
> > https://lkml.org/lkml/2013/12/19/630
> 
> I think the above url and the commit id that I reverted i.e
> git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0c29f79ecea0b6fbcefc999
> are the same.
> 
> Or am I missing something? 

No, please take a closer look, it is a different approaches to the same
end.

diff --git a/kernel/futex.c b/kernel/futex.c
index 34ecd9d..35ff697 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -203,6 +203,7 @@ static const struct futex_q futex_q_init = {
  * waiting on a futex.
  */
 struct futex_hash_bucket {
+	atomic_t waiters;
 	spinlock_t lock;
 	struct plist_head chain;
 } ____cacheline_aligned_in_smp;
@@ -211,6 +212,53 @@ static unsigned long __read_mostly futex_hashsize;
 
 static struct futex_hash_bucket *futex_queues;
 
+static inline void futex_get_mm(union futex_key *key)
+{
+	atomic_inc(&key->private.mm->mm_count);
+#ifdef CONFIG_SMP
+	/*
+	 * Ensure futex_get_mm() implies a full barrier such that
+	 * get_futex_key() implies a full barrier. This is relied upon
+	 * as full barrier (B), see the ordering comment above.
+	 */
+	smp_mb__after_atomic_inc();
+#endif
+}
+
+/*
+ * Reflects a new waiter being added to the waitqueue.
+ */
+static inline void hb_waiters_inc(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	atomic_inc(&hb->waiters);
+	/*
+	 * Full barrier (A), see the ordering comment above.
+	 */
+	smp_mb__after_atomic_inc();
+#endif
+}
+
+/*
+ * Reflects a waiter being removed from the waitqueue by wakeup
+ * paths.
+ */
+static inline void hb_waiters_dec(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	atomic_dec(&hb->waiters);
+#endif
+}
+
+static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	return atomic_read(&hb->waiters);
+#else
+	return 1;
+#endif
+}
+
 /*
  * We hash on the keys returned from get_futex_key (see below).
  */
@@ -245,10 +293,10 @@ static void get_futex_key_refs(union futex_key *key)
 
 	switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
 	case FUT_OFF_INODE:
-		ihold(key->shared.inode);
+		ihold(key->shared.inode); /* implies MB */
 		break;
 	case FUT_OFF_MMSHARED:
-		atomic_inc(&key->private.mm->mm_count);
+		futex_get_mm(key); /* implies MB */
 		break;
 	}
 }
@@ -322,7 +370,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
 	if (!fshared) {
 		key->private.mm = mm;
 		key->private.address = address;
-		get_futex_key_refs(key);
+		get_futex_key_refs(key); /* implies MB (B) */
 		return 0;
 	}
 
@@ -429,7 +477,7 @@ again:
 		key->shared.pgoff = basepage_index(page);
 	}
 
-	get_futex_key_refs(key);
+	get_futex_key_refs(key); /* implies MB (B) */
 
 out:
 	unlock_page(page_head);
@@ -893,6 +941,7 @@ static void __unqueue_futex(struct futex_q *q)
 
 	hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock);
 	plist_del(&q->list, &hb->chain);
+	hb_waiters_dec(hb);
 }
 
 /*
@@ -1052,6 +1101,11 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 		goto out;
 
 	hb = hash_futex(&key);
+
+	/* Make sure we really have tasks to wakeup */
+	if (!hb_waiters_pending(hb))
+		goto out_put_key;
+
 	spin_lock(&hb->lock);
 
 	plist_for_each_entry_safe(this, next, &hb->chain, list) {
@@ -1072,6 +1126,7 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	}
 
 	spin_unlock(&hb->lock);
+out_put_key:
 	put_futex_key(&key);
 out:
 	return ret;
@@ -1190,7 +1245,9 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
 	 */
 	if (likely(&hb1->chain != &hb2->chain)) {
 		plist_del(&q->list, &hb1->chain);
+		hb_waiters_dec(hb1);
 		plist_add(&q->list, &hb2->chain);
+		hb_waiters_inc(hb2);
 		q->lock_ptr = &hb2->lock;
 	}
 	get_futex_key_refs(key2);
@@ -1533,6 +1590,17 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q)
 	struct futex_hash_bucket *hb;
 
 	hb = hash_futex(&q->key);
+
+	/*
+	 * Increment the counter before taking the lock so that
+	 * a potential waker won't miss a to-be-slept task that is
+	 * waiting for the spinlock. This is safe as all queue_lock()
+	 * users end up calling queue_me(). Similarly, for housekeeping,
+	 * decrement the counter at queue_unlock() when some error has
+	 * occurred and we don't end up adding the task to the list.
+	 */
+	hb_waiters_inc(hb);
+
 	q->lock_ptr = &hb->lock;
 
 	spin_lock(&hb->lock);
@@ -1544,6 +1612,7 @@ queue_unlock(struct futex_hash_bucket *hb)
 	__releases(&hb->lock)
 {
 	spin_unlock(&hb->lock);
+	hb_waiters_dec(hb);
 }
 
 /**
@@ -2275,6 +2344,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
 		 * Unqueue the futex_q and determine which it was.
 		 */
 		plist_del(&q->list, &hb->chain);
+		hb_waiters_dec(hb);
 
 		/* Handle spurious wakeups gracefully */
 		ret = -EWOULDBLOCK;
@@ -2808,6 +2878,7 @@ static int __init futex_init(void)
 		futex_cmpxchg_enabled = 1;
 
 	for (i = 0; i < futex_hashsize; i++) {
+		atomic_set(&futex_queues[i].waiters, 0);
 		plist_head_init(&futex_queues[i].chain);
 		spin_lock_init(&futex_queues[i].lock);
 	}




^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 15:06             ` Davidlohr Bueso
  0 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 15:06 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, linuxppc-dev, LKML, paulus, tglx, Paul McKenney,
	torvalds, mingo

On Thu, 2014-03-20 at 15:38 +0530, Srikar Dronamraju wrote:
> > This problem suggests that we missed a wakeup for a task that was adding
> > itself to the queue in a wait path. And the only place that can happen
> > is with the hb spinlock check for any pending waiters. Just in case we
> > missed some assumption about checking the hash bucket spinlock as a way
> > of detecting any waiters (powerpc?), could you revert this commit and
> > try the original atomic operations variant:
> > 
> > https://lkml.org/lkml/2013/12/19/630
> 
> I think the above url and the commit id that I reverted i.e
> git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=b0c29f79ecea0b6fbcefc999
> are the same.
> 
> Or am I missing something? 

No, please take a closer look, it is a different approaches to the same
end.

diff --git a/kernel/futex.c b/kernel/futex.c
index 34ecd9d..35ff697 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -203,6 +203,7 @@ static const struct futex_q futex_q_init = {
  * waiting on a futex.
  */
 struct futex_hash_bucket {
+	atomic_t waiters;
 	spinlock_t lock;
 	struct plist_head chain;
 } ____cacheline_aligned_in_smp;
@@ -211,6 +212,53 @@ static unsigned long __read_mostly futex_hashsize;
 
 static struct futex_hash_bucket *futex_queues;
 
+static inline void futex_get_mm(union futex_key *key)
+{
+	atomic_inc(&key->private.mm->mm_count);
+#ifdef CONFIG_SMP
+	/*
+	 * Ensure futex_get_mm() implies a full barrier such that
+	 * get_futex_key() implies a full barrier. This is relied upon
+	 * as full barrier (B), see the ordering comment above.
+	 */
+	smp_mb__after_atomic_inc();
+#endif
+}
+
+/*
+ * Reflects a new waiter being added to the waitqueue.
+ */
+static inline void hb_waiters_inc(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	atomic_inc(&hb->waiters);
+	/*
+	 * Full barrier (A), see the ordering comment above.
+	 */
+	smp_mb__after_atomic_inc();
+#endif
+}
+
+/*
+ * Reflects a waiter being removed from the waitqueue by wakeup
+ * paths.
+ */
+static inline void hb_waiters_dec(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	atomic_dec(&hb->waiters);
+#endif
+}
+
+static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	return atomic_read(&hb->waiters);
+#else
+	return 1;
+#endif
+}
+
 /*
  * We hash on the keys returned from get_futex_key (see below).
  */
@@ -245,10 +293,10 @@ static void get_futex_key_refs(union futex_key *key)
 
 	switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
 	case FUT_OFF_INODE:
-		ihold(key->shared.inode);
+		ihold(key->shared.inode); /* implies MB */
 		break;
 	case FUT_OFF_MMSHARED:
-		atomic_inc(&key->private.mm->mm_count);
+		futex_get_mm(key); /* implies MB */
 		break;
 	}
 }
@@ -322,7 +370,7 @@ get_futex_key(u32 __user *uaddr, int fshared, union futex_key *key, int rw)
 	if (!fshared) {
 		key->private.mm = mm;
 		key->private.address = address;
-		get_futex_key_refs(key);
+		get_futex_key_refs(key); /* implies MB (B) */
 		return 0;
 	}
 
@@ -429,7 +477,7 @@ again:
 		key->shared.pgoff = basepage_index(page);
 	}
 
-	get_futex_key_refs(key);
+	get_futex_key_refs(key); /* implies MB (B) */
 
 out:
 	unlock_page(page_head);
@@ -893,6 +941,7 @@ static void __unqueue_futex(struct futex_q *q)
 
 	hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock);
 	plist_del(&q->list, &hb->chain);
+	hb_waiters_dec(hb);
 }
 
 /*
@@ -1052,6 +1101,11 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 		goto out;
 
 	hb = hash_futex(&key);
+
+	/* Make sure we really have tasks to wakeup */
+	if (!hb_waiters_pending(hb))
+		goto out_put_key;
+
 	spin_lock(&hb->lock);
 
 	plist_for_each_entry_safe(this, next, &hb->chain, list) {
@@ -1072,6 +1126,7 @@ futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
 	}
 
 	spin_unlock(&hb->lock);
+out_put_key:
 	put_futex_key(&key);
 out:
 	return ret;
@@ -1190,7 +1245,9 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
 	 */
 	if (likely(&hb1->chain != &hb2->chain)) {
 		plist_del(&q->list, &hb1->chain);
+		hb_waiters_dec(hb1);
 		plist_add(&q->list, &hb2->chain);
+		hb_waiters_inc(hb2);
 		q->lock_ptr = &hb2->lock;
 	}
 	get_futex_key_refs(key2);
@@ -1533,6 +1590,17 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q)
 	struct futex_hash_bucket *hb;
 
 	hb = hash_futex(&q->key);
+
+	/*
+	 * Increment the counter before taking the lock so that
+	 * a potential waker won't miss a to-be-slept task that is
+	 * waiting for the spinlock. This is safe as all queue_lock()
+	 * users end up calling queue_me(). Similarly, for housekeeping,
+	 * decrement the counter at queue_unlock() when some error has
+	 * occurred and we don't end up adding the task to the list.
+	 */
+	hb_waiters_inc(hb);
+
 	q->lock_ptr = &hb->lock;
 
 	spin_lock(&hb->lock);
@@ -1544,6 +1612,7 @@ queue_unlock(struct futex_hash_bucket *hb)
 	__releases(&hb->lock)
 {
 	spin_unlock(&hb->lock);
+	hb_waiters_dec(hb);
 }
 
 /**
@@ -2275,6 +2344,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
 		 * Unqueue the futex_q and determine which it was.
 		 */
 		plist_del(&q->list, &hb->chain);
+		hb_waiters_dec(hb);
 
 		/* Handle spurious wakeups gracefully */
 		ret = -EWOULDBLOCK;
@@ -2808,6 +2878,7 @@ static int __init futex_init(void)
 		futex_cmpxchg_enabled = 1;
 
 	for (i = 0; i < futex_hashsize; i++) {
+		atomic_set(&futex_queues[i].waiters, 0);
 		plist_head_init(&futex_queues[i].chain);
 		spin_lock_init(&futex_queues[i].lock);
 	}

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20  5:56         ` Davidlohr Bueso
@ 2014-03-20 16:31           ` Davidlohr Bueso
  -1 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 16:31 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, torvalds, tglx, mingo, LKML, linuxppc-dev, benh,
	paulus, Paul McKenney

On Wed, 2014-03-19 at 22:56 -0700, Davidlohr Bueso wrote:
> On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote:
> > > > Joy,.. let me look at that with ppc in mind.
> > > 
> > > OK; so while pretty much all the comments from that patch are utter
> > > nonsense (what was I thinking), I cannot actually find a real bug.
> > > 
> > > But could you try the below which replaces a control dependency with a
> > > full barrier. The control flow is plenty convoluted that I think the
> > > control barrier isn't actually valid anymore and that might indeed
> > > explain the fail.
> > > 
> > 
> > Unfortunately the patch didnt help. Still seeing tasks stuck
> > 
> > # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
> > 14680 pts/0    root     java         - 0 futex_wait_queue_me
> > 14797 pts/0    root     java         - 0 futex_wait_queue_me
> > # :> /var/log/messages
> > # echo t > /proc/sysrq-trigger 
> > # grep futex_wait_queue_me /var/log/messages | wc -l 
> > 334
> > #
> > 
> > [ 6904.211478] Call Trace:
> > [ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable)
> > [ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330
> > [ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0
> > [ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0
> > [ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0
> > [ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80
> > [ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0
> > [ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c
> > [ 6904.238879] java            S 00003fff825f6044     0 14682  14076 0x00000080
> > 
> > Is there any other information that I provide that can help?
> 
> This problem suggests that we missed a wakeup for a task that was adding
> itself to the queue in a wait path. And the only place that can happen
> is with the hb spinlock check for any pending waiters. Just in case we
> missed some assumption about checking the hash bucket spinlock as a way
> of detecting any waiters (powerpc?), could you revert this commit and
> try the original atomic operations variant:
> 
> https://lkml.org/lkml/2013/12/19/630

hmmm looking at ppc spinlock code, it seems that it doesn't have ticket
spinlocks -- in fact Torsten Duwe has been trying to get them upstream
very recently. Since we rely on the counter for detecting waiters, this
might explain the issue. Could someone confirm this spinlock
implementation difference? 


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 16:31           ` Davidlohr Bueso
  0 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 16:31 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, torvalds, LKML, paulus, tglx, Paul McKenney,
	linuxppc-dev, mingo

On Wed, 2014-03-19 at 22:56 -0700, Davidlohr Bueso wrote:
> On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote:
> > > > Joy,.. let me look at that with ppc in mind.
> > > 
> > > OK; so while pretty much all the comments from that patch are utter
> > > nonsense (what was I thinking), I cannot actually find a real bug.
> > > 
> > > But could you try the below which replaces a control dependency with a
> > > full barrier. The control flow is plenty convoluted that I think the
> > > control barrier isn't actually valid anymore and that might indeed
> > > explain the fail.
> > > 
> > 
> > Unfortunately the patch didnt help. Still seeing tasks stuck
> > 
> > # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
> > 14680 pts/0    root     java         - 0 futex_wait_queue_me
> > 14797 pts/0    root     java         - 0 futex_wait_queue_me
> > # :> /var/log/messages
> > # echo t > /proc/sysrq-trigger 
> > # grep futex_wait_queue_me /var/log/messages | wc -l 
> > 334
> > #
> > 
> > [ 6904.211478] Call Trace:
> > [ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable)
> > [ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330
> > [ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0
> > [ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0
> > [ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0
> > [ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80
> > [ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0
> > [ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c
> > [ 6904.238879] java            S 00003fff825f6044     0 14682  14076 0x00000080
> > 
> > Is there any other information that I provide that can help?
> 
> This problem suggests that we missed a wakeup for a task that was adding
> itself to the queue in a wait path. And the only place that can happen
> is with the hb spinlock check for any pending waiters. Just in case we
> missed some assumption about checking the hash bucket spinlock as a way
> of detecting any waiters (powerpc?), could you revert this commit and
> try the original atomic operations variant:
> 
> https://lkml.org/lkml/2013/12/19/630

hmmm looking at ppc spinlock code, it seems that it doesn't have ticket
spinlocks -- in fact Torsten Duwe has been trying to get them upstream
very recently. Since we rely on the counter for detecting waiters, this
might explain the issue. Could someone confirm this spinlock
implementation difference? 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20  5:56         ` Davidlohr Bueso
@ 2014-03-20 16:41           ` Linus Torvalds
  -1 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 16:41 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

[-- Attachment #1: Type: text/plain, Size: 1678 bytes --]

On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> This problem suggests that we missed a wakeup for a task that was adding
> itself to the queue in a wait path. And the only place that can happen
> is with the hb spinlock check for any pending waiters.

Ok, so thinking about hb_waiters_pending():

 - spin_is_locked() test
 - read barrier
 - plist_head_empty() test

seems to be broken after all.

The race is against futex_wait() that does

 - futex_wait_setup():
   - queue_lock()
   - get_futex_value_locked()
 - futex_wait_queue_me()
   - queue_me()
     - plist_add()

right?

It strikes me that the "spin_is_locked()" test has no barriers wrt the
writing of the new futex value on the wake path. And the read barrier
obviously does nothing wrt the write either. Or am I missing
something? So the write that actually released the futex might be
almost arbitrarily delayed on the waking side. So the waiting side may
not see the new value, even though the waker assumes it does due to
the ordering of it doing the write first.

So maybe we need a memory barrier in hb_waiters_pending() just to make
sure that the new value is written.

But at that point, I suspect that Davidlohrs original patch that just
had explicit waiting counts is just as well. The whole point with the
head empty test was to emulate that "do we have waiters" without
having to actually add the atomics, but a memory barrier is really no
worse.

The attached is a TOTALLY UNTESTED interdiff that adds back Davidlohrs
atomic count. It may be terminally broken, I literally didn't test it
at all.

Davidlohr, mind checking and correcting this?

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 3576 bytes --]

 kernel/futex.c | 53 +++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 44a1261cb9ff..08ec814ad9d2 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -234,6 +234,7 @@ static const struct futex_q futex_q_init = {
  * waiting on a futex.
  */
 struct futex_hash_bucket {
+	atomic_t waiters;
 	spinlock_t lock;
 	struct plist_head chain;
 } ____cacheline_aligned_in_smp;
@@ -253,22 +254,37 @@ static inline void futex_get_mm(union futex_key *key)
 	smp_mb__after_atomic_inc();
 }
 
-static inline bool hb_waiters_pending(struct futex_hash_bucket *hb)
+/*
+ * Reflects a new waiter being added to the waitqueue.
+ */
+static inline void hb_waiters_inc(struct futex_hash_bucket *hb)
 {
 #ifdef CONFIG_SMP
+	atomic_inc(&hb->waiters);
 	/*
-	 * Tasks trying to enter the critical region are most likely
-	 * potential waiters that will be added to the plist. Ensure
-	 * that wakers won't miss to-be-slept tasks in the window between
-	 * the wait call and the actual plist_add.
+	 * Full barrier (A), see the ordering comment above.
 	 */
-	if (spin_is_locked(&hb->lock))
-		return true;
-	smp_rmb(); /* Make sure we check the lock state first */
+	smp_mb__after_atomic_inc();
+#endif
+}
+
+/*
+ * Reflects a waiter being removed from the waitqueue by wakeup
+ * paths.
+ */
+static inline void hb_waiters_dec(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	atomic_dec(&hb->waiters);
+#endif
+}
 
-	return !plist_head_empty(&hb->chain);
+static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	return atomic_read(&hb->waiters);
 #else
-	return true;
+	return 1;
 #endif
 }
 
@@ -954,6 +970,7 @@ static void __unqueue_futex(struct futex_q *q)
 
 	hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock);
 	plist_del(&q->list, &hb->chain);
+	hb_waiters_dec(hb);
 }
 
 /*
@@ -1257,7 +1274,9 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
 	 */
 	if (likely(&hb1->chain != &hb2->chain)) {
 		plist_del(&q->list, &hb1->chain);
+		hb_waiters_dec(hb1);
 		plist_add(&q->list, &hb2->chain);
+		hb_waiters_inc(hb2);
 		q->lock_ptr = &hb2->lock;
 	}
 	get_futex_key_refs(key2);
@@ -1600,6 +1619,17 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q)
 	struct futex_hash_bucket *hb;
 
 	hb = hash_futex(&q->key);
+
+	/*
+	 * Increment the counter before taking the lock so that
+	 * a potential waker won't miss a to-be-slept task that is
+	 * waiting for the spinlock. This is safe as all queue_lock()
+	 * users end up calling queue_me(). Similarly, for housekeeping,
+	 * decrement the counter at queue_unlock() when some error has
+	 * occurred and we don't end up adding the task to the list.
+	 */
+	hb_waiters_inc(hb);
+
 	q->lock_ptr = &hb->lock;
 
 	spin_lock(&hb->lock); /* implies MB (A) */
@@ -1611,6 +1641,7 @@ queue_unlock(struct futex_hash_bucket *hb)
 	__releases(&hb->lock)
 {
 	spin_unlock(&hb->lock);
+	hb_waiters_dec(hb);
 }
 
 /**
@@ -2342,6 +2373,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
 		 * Unqueue the futex_q and determine which it was.
 		 */
 		plist_del(&q->list, &hb->chain);
+		hb_waiters_dec(hb);
 
 		/* Handle spurious wakeups gracefully */
 		ret = -EWOULDBLOCK;
@@ -2875,6 +2907,7 @@ static int __init futex_init(void)
 		futex_cmpxchg_enabled = 1;
 
 	for (i = 0; i < futex_hashsize; i++) {
+		atomic_set(&futex_queues[i].waiters, 0);
 		plist_head_init(&futex_queues[i].chain);
 		spin_lock_init(&futex_queues[i].lock);
 	}

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 16:41           ` Linus Torvalds
  0 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 16:41 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

[-- Attachment #1: Type: text/plain, Size: 1678 bytes --]

On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> This problem suggests that we missed a wakeup for a task that was adding
> itself to the queue in a wait path. And the only place that can happen
> is with the hb spinlock check for any pending waiters.

Ok, so thinking about hb_waiters_pending():

 - spin_is_locked() test
 - read barrier
 - plist_head_empty() test

seems to be broken after all.

The race is against futex_wait() that does

 - futex_wait_setup():
   - queue_lock()
   - get_futex_value_locked()
 - futex_wait_queue_me()
   - queue_me()
     - plist_add()

right?

It strikes me that the "spin_is_locked()" test has no barriers wrt the
writing of the new futex value on the wake path. And the read barrier
obviously does nothing wrt the write either. Or am I missing
something? So the write that actually released the futex might be
almost arbitrarily delayed on the waking side. So the waiting side may
not see the new value, even though the waker assumes it does due to
the ordering of it doing the write first.

So maybe we need a memory barrier in hb_waiters_pending() just to make
sure that the new value is written.

But at that point, I suspect that Davidlohrs original patch that just
had explicit waiting counts is just as well. The whole point with the
head empty test was to emulate that "do we have waiters" without
having to actually add the atomics, but a memory barrier is really no
worse.

The attached is a TOTALLY UNTESTED interdiff that adds back Davidlohrs
atomic count. It may be terminally broken, I literally didn't test it
at all.

Davidlohr, mind checking and correcting this?

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 3576 bytes --]

 kernel/futex.c | 53 +++++++++++++++++++++++++++++++++++++++++++----------
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 44a1261cb9ff..08ec814ad9d2 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -234,6 +234,7 @@ static const struct futex_q futex_q_init = {
  * waiting on a futex.
  */
 struct futex_hash_bucket {
+	atomic_t waiters;
 	spinlock_t lock;
 	struct plist_head chain;
 } ____cacheline_aligned_in_smp;
@@ -253,22 +254,37 @@ static inline void futex_get_mm(union futex_key *key)
 	smp_mb__after_atomic_inc();
 }
 
-static inline bool hb_waiters_pending(struct futex_hash_bucket *hb)
+/*
+ * Reflects a new waiter being added to the waitqueue.
+ */
+static inline void hb_waiters_inc(struct futex_hash_bucket *hb)
 {
 #ifdef CONFIG_SMP
+	atomic_inc(&hb->waiters);
 	/*
-	 * Tasks trying to enter the critical region are most likely
-	 * potential waiters that will be added to the plist. Ensure
-	 * that wakers won't miss to-be-slept tasks in the window between
-	 * the wait call and the actual plist_add.
+	 * Full barrier (A), see the ordering comment above.
 	 */
-	if (spin_is_locked(&hb->lock))
-		return true;
-	smp_rmb(); /* Make sure we check the lock state first */
+	smp_mb__after_atomic_inc();
+#endif
+}
+
+/*
+ * Reflects a waiter being removed from the waitqueue by wakeup
+ * paths.
+ */
+static inline void hb_waiters_dec(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	atomic_dec(&hb->waiters);
+#endif
+}
 
-	return !plist_head_empty(&hb->chain);
+static inline int hb_waiters_pending(struct futex_hash_bucket *hb)
+{
+#ifdef CONFIG_SMP
+	return atomic_read(&hb->waiters);
 #else
-	return true;
+	return 1;
 #endif
 }
 
@@ -954,6 +970,7 @@ static void __unqueue_futex(struct futex_q *q)
 
 	hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock);
 	plist_del(&q->list, &hb->chain);
+	hb_waiters_dec(hb);
 }
 
 /*
@@ -1257,7 +1274,9 @@ void requeue_futex(struct futex_q *q, struct futex_hash_bucket *hb1,
 	 */
 	if (likely(&hb1->chain != &hb2->chain)) {
 		plist_del(&q->list, &hb1->chain);
+		hb_waiters_dec(hb1);
 		plist_add(&q->list, &hb2->chain);
+		hb_waiters_inc(hb2);
 		q->lock_ptr = &hb2->lock;
 	}
 	get_futex_key_refs(key2);
@@ -1600,6 +1619,17 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q)
 	struct futex_hash_bucket *hb;
 
 	hb = hash_futex(&q->key);
+
+	/*
+	 * Increment the counter before taking the lock so that
+	 * a potential waker won't miss a to-be-slept task that is
+	 * waiting for the spinlock. This is safe as all queue_lock()
+	 * users end up calling queue_me(). Similarly, for housekeeping,
+	 * decrement the counter at queue_unlock() when some error has
+	 * occurred and we don't end up adding the task to the list.
+	 */
+	hb_waiters_inc(hb);
+
 	q->lock_ptr = &hb->lock;
 
 	spin_lock(&hb->lock); /* implies MB (A) */
@@ -1611,6 +1641,7 @@ queue_unlock(struct futex_hash_bucket *hb)
 	__releases(&hb->lock)
 {
 	spin_unlock(&hb->lock);
+	hb_waiters_dec(hb);
 }
 
 /**
@@ -2342,6 +2373,7 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
 		 * Unqueue the futex_q and determine which it was.
 		 */
 		plist_del(&q->list, &hb->chain);
+		hb_waiters_dec(hb);
 
 		/* Handle spurious wakeups gracefully */
 		ret = -EWOULDBLOCK;
@@ -2875,6 +2907,7 @@ static int __init futex_init(void)
 		futex_cmpxchg_enabled = 1;
 
 	for (i = 0; i < futex_hashsize; i++) {
+		atomic_set(&futex_queues[i].waiters, 0);
 		plist_head_init(&futex_queues[i].chain);
 		spin_lock_init(&futex_queues[i].lock);
 	}

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 16:41           ` Linus Torvalds
@ 2014-03-20 17:18             ` Davidlohr Bueso
  -1 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 17:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Srikar Dronamraju, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Thu, 2014-03-20 at 09:41 -0700, Linus Torvalds wrote:
> On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >
> > This problem suggests that we missed a wakeup for a task that was adding
> > itself to the queue in a wait path. And the only place that can happen
> > is with the hb spinlock check for any pending waiters.
> 
> Ok, so thinking about hb_waiters_pending():
> 
>  - spin_is_locked() test
>  - read barrier
>  - plist_head_empty() test
> 
> seems to be broken after all.
> 
> The race is against futex_wait() that does
> 
>  - futex_wait_setup():
>    - queue_lock()
>    - get_futex_value_locked()
>  - futex_wait_queue_me()
>    - queue_me()
>      - plist_add()
> 
> right?

Yep.

> 
> It strikes me that the "spin_is_locked()" test has no barriers wrt the
> writing of the new futex value on the wake path. And the read barrier
> obviously does nothing wrt the write either. Or am I missing
> something? So the write that actually released the futex might be
> almost arbitrarily delayed on the waking side. So the waiting side may
> not see the new value, even though the waker assumes it does due to
> the ordering of it doing the write first.

Aha, that would certainly violate the ordering guarantees. I feared
_something_ like that when we originally discussed your suggestion as
opposed to the atomics one, but didn't have any case for it either.

> So maybe we need a memory barrier in hb_waiters_pending() just to make
> sure that the new value is written.
> 
> But at that point, I suspect that Davidlohrs original patch that just
> had explicit waiting counts is just as well. The whole point with the
> head empty test was to emulate that "do we have waiters" without
> having to actually add the atomics, but a memory barrier is really no
> worse.
> 
> The attached is a TOTALLY UNTESTED interdiff that adds back Davidlohrs
> atomic count. It may be terminally broken, I literally didn't test it
> at all.

Comparing with the patch I sent earlier this morning, looks equivalent,
and fwiw, passes my initial qemu bootup, which is the first way of
detecting anything stupid going on.

So, Srikar, please try this patch out, as opposed to mine, you don't
have to first revert the commit in question.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 17:18             ` Davidlohr Bueso
  0 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 17:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Thu, 2014-03-20 at 09:41 -0700, Linus Torvalds wrote:
> On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >
> > This problem suggests that we missed a wakeup for a task that was adding
> > itself to the queue in a wait path. And the only place that can happen
> > is with the hb spinlock check for any pending waiters.
> 
> Ok, so thinking about hb_waiters_pending():
> 
>  - spin_is_locked() test
>  - read barrier
>  - plist_head_empty() test
> 
> seems to be broken after all.
> 
> The race is against futex_wait() that does
> 
>  - futex_wait_setup():
>    - queue_lock()
>    - get_futex_value_locked()
>  - futex_wait_queue_me()
>    - queue_me()
>      - plist_add()
> 
> right?

Yep.

> 
> It strikes me that the "spin_is_locked()" test has no barriers wrt the
> writing of the new futex value on the wake path. And the read barrier
> obviously does nothing wrt the write either. Or am I missing
> something? So the write that actually released the futex might be
> almost arbitrarily delayed on the waking side. So the waiting side may
> not see the new value, even though the waker assumes it does due to
> the ordering of it doing the write first.

Aha, that would certainly violate the ordering guarantees. I feared
_something_ like that when we originally discussed your suggestion as
opposed to the atomics one, but didn't have any case for it either.

> So maybe we need a memory barrier in hb_waiters_pending() just to make
> sure that the new value is written.
> 
> But at that point, I suspect that Davidlohrs original patch that just
> had explicit waiting counts is just as well. The whole point with the
> head empty test was to emulate that "do we have waiters" without
> having to actually add the atomics, but a memory barrier is really no
> worse.
> 
> The attached is a TOTALLY UNTESTED interdiff that adds back Davidlohrs
> atomic count. It may be terminally broken, I literally didn't test it
> at all.

Comparing with the patch I sent earlier this morning, looks equivalent,
and fwiw, passes my initial qemu bootup, which is the first way of
detecting anything stupid going on.

So, Srikar, please try this patch out, as opposed to mine, you don't
have to first revert the commit in question.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 17:18             ` Davidlohr Bueso
@ 2014-03-20 17:42               ` Linus Torvalds
  -1 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 17:42 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>> It strikes me that the "spin_is_locked()" test has no barriers wrt the
>> writing of the new futex value on the wake path. And the read barrier
>> obviously does nothing wrt the write either. Or am I missing
>> something? So the write that actually released the futex might be
>> almost arbitrarily delayed on the waking side. So the waiting side may
>> not see the new value, even though the waker assumes it does due to
>> the ordering of it doing the write first.
>
> Aha, that would certainly violate the ordering guarantees. I feared
> _something_ like that when we originally discussed your suggestion as
> opposed to the atomics one, but didn't have any case for it either.

Actually, looking closer, we have the memory barrier in
get_futex_key_refs() (called by "get_futex_key()") so that's not it.
In fact, your "atomic_read(&hb->waiters)" doesn't have any more
serialization than the spin_is_locked() test had.

But the spin_is_locked() and queue-empty tests are two separate memory
reads, and maybe there is some ordering wrt those two that we missed,
so the "waiters" patch is worth trying anyway.

I do still dislike how the "waiters" thing adds an atomic update, but whatever..

          Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 17:42               ` Linus Torvalds
  0 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 17:42 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>> It strikes me that the "spin_is_locked()" test has no barriers wrt the
>> writing of the new futex value on the wake path. And the read barrier
>> obviously does nothing wrt the write either. Or am I missing
>> something? So the write that actually released the futex might be
>> almost arbitrarily delayed on the waking side. So the waiting side may
>> not see the new value, even though the waker assumes it does due to
>> the ordering of it doing the write first.
>
> Aha, that would certainly violate the ordering guarantees. I feared
> _something_ like that when we originally discussed your suggestion as
> opposed to the atomics one, but didn't have any case for it either.

Actually, looking closer, we have the memory barrier in
get_futex_key_refs() (called by "get_futex_key()") so that's not it.
In fact, your "atomic_read(&hb->waiters)" doesn't have any more
serialization than the spin_is_locked() test had.

But the spin_is_locked() and queue-empty tests are two separate memory
reads, and maybe there is some ordering wrt those two that we missed,
so the "waiters" patch is worth trying anyway.

I do still dislike how the "waiters" thing adds an atomic update, but whatever..

          Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 17:42               ` Linus Torvalds
@ 2014-03-20 18:03                 ` Davidlohr Bueso
  -1 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 18:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Srikar Dronamraju, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Thu, 2014-03-20 at 10:42 -0700, Linus Torvalds wrote:
> On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >> It strikes me that the "spin_is_locked()" test has no barriers wrt the
> >> writing of the new futex value on the wake path. And the read barrier
> >> obviously does nothing wrt the write either. Or am I missing
> >> something? So the write that actually released the futex might be
> >> almost arbitrarily delayed on the waking side. So the waiting side may
> >> not see the new value, even though the waker assumes it does due to
> >> the ordering of it doing the write first.
> >
> > Aha, that would certainly violate the ordering guarantees. I feared
> > _something_ like that when we originally discussed your suggestion as
> > opposed to the atomics one, but didn't have any case for it either.
> 
> Actually, looking closer, we have the memory barrier in
> get_futex_key_refs() (called by "get_futex_key()") so that's not it.
> In fact, your "atomic_read(&hb->waiters)" doesn't have any more
> serialization than the spin_is_locked() test had.
> 
> But the spin_is_locked() and queue-empty tests are two separate memory
> reads, and maybe there is some ordering wrt those two that we missed,
> so the "waiters" patch is worth trying anyway.

Well, imho we would have seen something wrong much much earlier. This
patch has been very heavily tested (including with the java workload
used by Shrikar). 

I still wonder about ppc and spinlocks (no ticketing!!) ... sure the
"waiters" patch might fix the problem just because we explicitly count
the members of the plist. And I guess if we cannot rely on all archs
having an equivalent spinlock implementation, we simply cannot use this
technique for futexes.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 18:03                 ` Davidlohr Bueso
  0 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 18:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Thu, 2014-03-20 at 10:42 -0700, Linus Torvalds wrote:
> On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >> It strikes me that the "spin_is_locked()" test has no barriers wrt the
> >> writing of the new futex value on the wake path. And the read barrier
> >> obviously does nothing wrt the write either. Or am I missing
> >> something? So the write that actually released the futex might be
> >> almost arbitrarily delayed on the waking side. So the waiting side may
> >> not see the new value, even though the waker assumes it does due to
> >> the ordering of it doing the write first.
> >
> > Aha, that would certainly violate the ordering guarantees. I feared
> > _something_ like that when we originally discussed your suggestion as
> > opposed to the atomics one, but didn't have any case for it either.
> 
> Actually, looking closer, we have the memory barrier in
> get_futex_key_refs() (called by "get_futex_key()") so that's not it.
> In fact, your "atomic_read(&hb->waiters)" doesn't have any more
> serialization than the spin_is_locked() test had.
> 
> But the spin_is_locked() and queue-empty tests are two separate memory
> reads, and maybe there is some ordering wrt those two that we missed,
> so the "waiters" patch is worth trying anyway.

Well, imho we would have seen something wrong much much earlier. This
patch has been very heavily tested (including with the java workload
used by Shrikar). 

I still wonder about ppc and spinlocks (no ticketing!!) ... sure the
"waiters" patch might fix the problem just because we explicitly count
the members of the plist. And I guess if we cannot rely on all archs
having an equivalent spinlock implementation, we simply cannot use this
technique for futexes.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 18:03                 ` Davidlohr Bueso
@ 2014-03-20 18:16                   ` Linus Torvalds
  -1 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 18:16 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Thu, Mar 20, 2014 at 11:03 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> I still wonder about ppc and spinlocks (no ticketing!!) ... sure the
> "waiters" patch might fix the problem just because we explicitly count
> the members of the plist. And I guess if we cannot rely on all archs
> having an equivalent spinlock implementation, we simply cannot use this
> technique for futexes.

So the ticketing part means that on x86 we see pending waiters even
when a previous one does "spin_unlock()". I agree that that is a
fundamental difference between x86 and powerpc, and it does seem to be
the most likely culprit.

And dammit, I *liked* my "don't use an explicit waiter count"
approach, so I'd love to be able to do it. But I we've never really
guaranteed that "is_spin_locked()" shows whether there are spinners,
so I don't know how to do that.

I guess we could expose some interface for the spinlock code to say
whether it supports that or not, and then switch between the two
algorithms. But that just feels very very ugly to me.

But let's see if the explicit waiter count version even solves the
thing on powerpc. Maybe it's something else, and we'll have to revert
entirely for now.

                 Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 18:16                   ` Linus Torvalds
  0 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 18:16 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Thu, Mar 20, 2014 at 11:03 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> I still wonder about ppc and spinlocks (no ticketing!!) ... sure the
> "waiters" patch might fix the problem just because we explicitly count
> the members of the plist. And I guess if we cannot rely on all archs
> having an equivalent spinlock implementation, we simply cannot use this
> technique for futexes.

So the ticketing part means that on x86 we see pending waiters even
when a previous one does "spin_unlock()". I agree that that is a
fundamental difference between x86 and powerpc, and it does seem to be
the most likely culprit.

And dammit, I *liked* my "don't use an explicit waiter count"
approach, so I'd love to be able to do it. But I we've never really
guaranteed that "is_spin_locked()" shows whether there are spinners,
so I don't know how to do that.

I guess we could expose some interface for the spinlock code to say
whether it supports that or not, and then switch between the two
algorithms. But that just feels very very ugly to me.

But let's see if the explicit waiter count version even solves the
thing on powerpc. Maybe it's something else, and we'll have to revert
entirely for now.

                 Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 17:18             ` Davidlohr Bueso
@ 2014-03-20 18:36               ` Linus Torvalds
  -1 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 18:36 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> Comparing with the patch I sent earlier this morning, looks equivalent,
> and fwiw, passes my initial qemu bootup, which is the first way of
> detecting anything stupid going on.
>
> So, Srikar, please try this patch out, as opposed to mine, you don't
> have to first revert the commit in question.

Ok, so it boots for me too, so hopefully it isn't totally broken.

However, since it's just closing a race, and since getting the counts
wrong should easily result in it *working* but always taking the slow
path (for example), I'd really like people to also verify that it
fixes the actual performance issue (ie assuming it fixes powerpc
behavior for Srikar, I'd like to get it double-checked that it also
avoids the spinlock in the common case). Because if the
increment/decrement pairings end up being wrong, we could have a
situation where the waiter count just ends up bogus, and it all works
from a correctness standpoint but not from the intended performance
optimization.

No way I can test that sanely on my single-socket machine. Davidlohr?

                  Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 18:36               ` Linus Torvalds
  0 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 18:36 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> Comparing with the patch I sent earlier this morning, looks equivalent,
> and fwiw, passes my initial qemu bootup, which is the first way of
> detecting anything stupid going on.
>
> So, Srikar, please try this patch out, as opposed to mine, you don't
> have to first revert the commit in question.

Ok, so it boots for me too, so hopefully it isn't totally broken.

However, since it's just closing a race, and since getting the counts
wrong should easily result in it *working* but always taking the slow
path (for example), I'd really like people to also verify that it
fixes the actual performance issue (ie assuming it fixes powerpc
behavior for Srikar, I'd like to get it double-checked that it also
avoids the spinlock in the common case). Because if the
increment/decrement pairings end up being wrong, we could have a
situation where the waiter count just ends up bogus, and it all works
from a correctness standpoint but not from the intended performance
optimization.

No way I can test that sanely on my single-socket machine. Davidlohr?

                  Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 18:36               ` Linus Torvalds
@ 2014-03-20 19:08                 ` Davidlohr Bueso
  -1 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 19:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Srikar Dronamraju, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Thu, 2014-03-20 at 11:36 -0700, Linus Torvalds wrote:
> On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >
> > Comparing with the patch I sent earlier this morning, looks equivalent,
> > and fwiw, passes my initial qemu bootup, which is the first way of
> > detecting anything stupid going on.
> >
> > So, Srikar, please try this patch out, as opposed to mine, you don't
> > have to first revert the commit in question.
> 
> Ok, so it boots for me too, so hopefully it isn't totally broken.
> 
> However, since it's just closing a race, and since getting the counts
> wrong should easily result in it *working* but always taking the slow
> path (for example), I'd really like people to also verify that it
> fixes the actual performance issue (ie assuming it fixes powerpc
> behavior for Srikar, I'd like to get it double-checked that it also
> avoids the spinlock in the common case). 

Oh, it does. This atomics technique was tested at a customer's site and
ready for upstream. To refresh, we were originally seeing massive
contention on the hb->lock and an enormous amounts of 0 returns from
futex_wake, indicating that spinners where piling up just to realize
that the plist was empty! While I don't have any official numbers, I can
confirm that perf showed that this issue was addressed with the atomics
variant. Yes, such pathological behavior shows problems in the userspace
locking primitives design/implementation, but allowing the kernel not to
be affected by suboptimal uses of futexes is definitely a plus. 

As tglx suggested at the time, I also made sure that adding the barriers
when doing the key refcounting didn't impose any serious restrictions to
performance either.

Now, what at the time required re-testing everything was when you
suggested replacing this approach with a more elegant spin is locked
test. Both approaches showed pretty much identical performance (and
correctness, at least on x86). And to this day shows *significant* less
time spent in kernel space dealing with futexes.


> Because if the
> increment/decrement pairings end up being wrong, we could have a
> situation where the waiter count just ends up bogus, and it all works
> from a correctness standpoint but not from the intended performance
> optimization.
> 
> No way I can test that sanely on my single-socket machine. Davidlohr?

Not this patch, no :( -- we could never blindly reproduce the customer's
workload. The only patch that I was able to create test cases for is the
larger hash table one, which simply alleviates collisions. This is now
part of perf-bench.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 19:08                 ` Davidlohr Bueso
  0 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 19:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Thu, 2014-03-20 at 11:36 -0700, Linus Torvalds wrote:
> On Thu, Mar 20, 2014 at 10:18 AM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >
> > Comparing with the patch I sent earlier this morning, looks equivalent,
> > and fwiw, passes my initial qemu bootup, which is the first way of
> > detecting anything stupid going on.
> >
> > So, Srikar, please try this patch out, as opposed to mine, you don't
> > have to first revert the commit in question.
> 
> Ok, so it boots for me too, so hopefully it isn't totally broken.
> 
> However, since it's just closing a race, and since getting the counts
> wrong should easily result in it *working* but always taking the slow
> path (for example), I'd really like people to also verify that it
> fixes the actual performance issue (ie assuming it fixes powerpc
> behavior for Srikar, I'd like to get it double-checked that it also
> avoids the spinlock in the common case). 

Oh, it does. This atomics technique was tested at a customer's site and
ready for upstream. To refresh, we were originally seeing massive
contention on the hb->lock and an enormous amounts of 0 returns from
futex_wake, indicating that spinners where piling up just to realize
that the plist was empty! While I don't have any official numbers, I can
confirm that perf showed that this issue was addressed with the atomics
variant. Yes, such pathological behavior shows problems in the userspace
locking primitives design/implementation, but allowing the kernel not to
be affected by suboptimal uses of futexes is definitely a plus. 

As tglx suggested at the time, I also made sure that adding the barriers
when doing the key refcounting didn't impose any serious restrictions to
performance either.

Now, what at the time required re-testing everything was when you
suggested replacing this approach with a more elegant spin is locked
test. Both approaches showed pretty much identical performance (and
correctness, at least on x86). And to this day shows *significant* less
time spent in kernel space dealing with futexes.


> Because if the
> increment/decrement pairings end up being wrong, we could have a
> situation where the waiter count just ends up bogus, and it all works
> from a correctness standpoint but not from the intended performance
> optimization.
> 
> No way I can test that sanely on my single-socket machine. Davidlohr?

Not this patch, no :( -- we could never blindly reproduce the customer's
workload. The only patch that I was able to create test cases for is the
larger hash table one, which simply alleviates collisions. This is now
part of perf-bench.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 19:08                 ` Davidlohr Bueso
@ 2014-03-20 19:25                   ` Linus Torvalds
  -1 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 19:25 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Thu, Mar 20, 2014 at 12:08 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> Oh, it does. This atomics technique was tested at a customer's site and
> ready for upstream.

I'm not worried about the *original* patch. I'm worried about the
incremental one.

Your original patch never applied to my tree - I think it was based on
-mm or something. So I couldn't verify my "let's go back to the
explicit 'waiters'" incremental patch against reverting and
re-applying the original patch.

So I'd like you to re-verify that that incremental patch really is
solid, and does what your original one did.

               Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 19:25                   ` Linus Torvalds
  0 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 19:25 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Thu, Mar 20, 2014 at 12:08 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> Oh, it does. This atomics technique was tested at a customer's site and
> ready for upstream.

I'm not worried about the *original* patch. I'm worried about the
incremental one.

Your original patch never applied to my tree - I think it was based on
-mm or something. So I couldn't verify my "let's go back to the
explicit 'waiters'" incremental patch against reverting and
re-applying the original patch.

So I'd like you to re-verify that that incremental patch really is
solid, and does what your original one did.

               Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 19:25                   ` Linus Torvalds
@ 2014-03-20 20:20                     ` Davidlohr Bueso
  -1 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 20:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Srikar Dronamraju, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Thu, 2014-03-20 at 12:25 -0700, Linus Torvalds wrote:
> On Thu, Mar 20, 2014 at 12:08 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >
> > Oh, it does. This atomics technique was tested at a customer's site and
> > ready for upstream.
> 
> I'm not worried about the *original* patch. I'm worried about the
> incremental one.
> 
> Your original patch never applied to my tree - I think it was based on
> -mm or something. So I couldn't verify my "let's go back to the
> explicit 'waiters'" incremental patch against reverting and
> re-applying the original patch.

Ok, so a big reason why this patch doesn't apply cleanly after reverting
is because *most* of the changes were done at the top of the file with
regards to documenting the ordering guarantees, the actual code changes
are quite minimal.

I reverted commits 99b60ce6 (documentation) and b0c29f79 (the offending
commit), and then I cleanly applied the equivalent ones from v3 of the
series (which was already *tested* and ready for upstream until you
suggested looking into the alternative spinlock approach):

https://lkml.org/lkml/2013/12/19/624
https://lkml.org/lkml/2013/12/19/630

Assuming the atomics solves the issue, would you be willing to take this
path? Any pending documentation fixes can be added afterwards. The
important thing is that the actual code is well tested.

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 20:20                     ` Davidlohr Bueso
  0 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-20 20:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Thu, 2014-03-20 at 12:25 -0700, Linus Torvalds wrote:
> On Thu, Mar 20, 2014 at 12:08 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
> >
> > Oh, it does. This atomics technique was tested at a customer's site and
> > ready for upstream.
> 
> I'm not worried about the *original* patch. I'm worried about the
> incremental one.
> 
> Your original patch never applied to my tree - I think it was based on
> -mm or something. So I couldn't verify my "let's go back to the
> explicit 'waiters'" incremental patch against reverting and
> re-applying the original patch.

Ok, so a big reason why this patch doesn't apply cleanly after reverting
is because *most* of the changes were done at the top of the file with
regards to documenting the ordering guarantees, the actual code changes
are quite minimal.

I reverted commits 99b60ce6 (documentation) and b0c29f79 (the offending
commit), and then I cleanly applied the equivalent ones from v3 of the
series (which was already *tested* and ready for upstream until you
suggested looking into the alternative spinlock approach):

https://lkml.org/lkml/2013/12/19/624
https://lkml.org/lkml/2013/12/19/630

Assuming the atomics solves the issue, would you be willing to take this
path? Any pending documentation fixes can be added afterwards. The
important thing is that the actual code is well tested.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 16:31           ` Davidlohr Bueso
@ 2014-03-20 20:23             ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 54+ messages in thread
From: Benjamin Herrenschmidt @ 2014-03-20 20:23 UTC (permalink / raw)
  To: torvalds
  Cc: Srikar Dronamraju, Peter Zijlstra, tglx, mingo, LKML,
	linuxppc-dev, paulus, Paul McKenney, Davidlohr Bueso

On Thu, 2014-03-20 at 09:31 -0700, Davidlohr Bueso wrote:
> hmmm looking at ppc spinlock code, it seems that it doesn't have ticket
> spinlocks -- in fact Torsten Duwe has been trying to get them upstream
> very recently. Since we rely on the counter for detecting waiters, this
> might explain the issue. Could someone confirm this spinlock
> implementation difference? 

Indeed. I haven't merged ticket locks because they break lockref :-(

We have a problem here because we need to store the lock holder so we
can yield to the lock holder partition on contention and we are running
out of space in the spinlock.

The lock holder doesn't have to be atomic, so in theory we could have
the tickets and the lockref in the same 64-bit and the holder separately
but the way the layers are stacked at the moment that's not workable,
at least not without duplicating the whole lockref implementation and
breaking the spinlock in two, a "base" lock without older and the separate
variant with holder field. A mess...

I want to try sorting that out at some stage but haven't had a chance yet.

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 20:23             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 54+ messages in thread
From: Benjamin Herrenschmidt @ 2014-03-20 20:23 UTC (permalink / raw)
  To: torvalds
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, paulus, Davidlohr Bueso,
	tglx, Paul McKenney, linuxppc-dev, mingo

On Thu, 2014-03-20 at 09:31 -0700, Davidlohr Bueso wrote:
> hmmm looking at ppc spinlock code, it seems that it doesn't have ticket
> spinlocks -- in fact Torsten Duwe has been trying to get them upstream
> very recently. Since we rely on the counter for detecting waiters, this
> might explain the issue. Could someone confirm this spinlock
> implementation difference? 

Indeed. I haven't merged ticket locks because they break lockref :-(

We have a problem here because we need to store the lock holder so we
can yield to the lock holder partition on contention and we are running
out of space in the spinlock.

The lock holder doesn't have to be atomic, so in theory we could have
the tickets and the lockref in the same 64-bit and the holder separately
but the way the layers are stacked at the moment that's not workable,
at least not without duplicating the whole lockref implementation and
breaking the spinlock in two, a "base" lock without older and the separate
variant with holder field. A mess...

I want to try sorting that out at some stage but haven't had a chance yet.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 20:20                     ` Davidlohr Bueso
@ 2014-03-20 20:36                       ` Linus Torvalds
  -1 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 20:36 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Thu, Mar 20, 2014 at 1:20 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> I reverted commits 99b60ce6 (documentation) and b0c29f79 (the offending
> commit), and then I cleanly applied the equivalent ones from v3 of the
> series (which was already *tested* and ready for upstream until you
> suggested looking into the alternative spinlock approach):
>
> https://lkml.org/lkml/2013/12/19/624
> https://lkml.org/lkml/2013/12/19/630
>
> Assuming the atomics solves the issue, would you be willing to take this
> path? Any pending documentation fixes can be added afterwards. The
> important thing is that the actual code is well tested.

So my preference would be to do that "tested code" thing, but then
edit out the comment changes and boil it down to just the minimal code
changes. So that you can see what the patch actually *does*, without
it being hidden by the bulk of the patch just being the reverts of the
comment fixups.

In fact, I hope that if you do that, you end up with the patch I just
created by hand, and then we'd have come to the same situation two
different ways independently, and I'd be doubly happy for that extra
cross-checking of what went on.

And I would *not* want to do this as "two reverts and one patch to
re-do things like we used to", because that just makes the actual
change even harder to see. And that's partly because if we eventually
do decide that "hey, if we can do this using the ticket lock as a
counter, let's do it that way", then this *small* fixup patch ends up
showing the actual real differences between the two approaches.

Of course, right now we don't even have confirmation from Srikar that
the explicit "waiters" counter even fixes things on powerpc, so.. All
the testing that orginal patch had was also on x86, so if it's some
subtle memory ordering issue that we haven't figured out now, rather
than the ticket lock thing, all this discussion about which way to go
turns out to be entirely premature.

            Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-20 20:36                       ` Linus Torvalds
  0 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-20 20:36 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Srikar Dronamraju, Peter Zijlstra, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Thu, Mar 20, 2014 at 1:20 PM, Davidlohr Bueso <davidlohr@hp.com> wrote:
>
> I reverted commits 99b60ce6 (documentation) and b0c29f79 (the offending
> commit), and then I cleanly applied the equivalent ones from v3 of the
> series (which was already *tested* and ready for upstream until you
> suggested looking into the alternative spinlock approach):
>
> https://lkml.org/lkml/2013/12/19/624
> https://lkml.org/lkml/2013/12/19/630
>
> Assuming the atomics solves the issue, would you be willing to take this
> path? Any pending documentation fixes can be added afterwards. The
> important thing is that the actual code is well tested.

So my preference would be to do that "tested code" thing, but then
edit out the comment changes and boil it down to just the minimal code
changes. So that you can see what the patch actually *does*, without
it being hidden by the bulk of the patch just being the reverts of the
comment fixups.

In fact, I hope that if you do that, you end up with the patch I just
created by hand, and then we'd have come to the same situation two
different ways independently, and I'd be doubly happy for that extra
cross-checking of what went on.

And I would *not* want to do this as "two reverts and one patch to
re-do things like we used to", because that just makes the actual
change even harder to see. And that's partly because if we eventually
do decide that "hey, if we can do this using the ticket lock as a
counter, let's do it that way", then this *small* fixup patch ends up
showing the actual real differences between the two approaches.

Of course, right now we don't even have confirmation from Srikar that
the explicit "waiters" counter even fixes things on powerpc, so.. All
the testing that orginal patch had was also on x86, so if it's some
subtle memory ordering issue that we haven't figured out now, rather
than the ticket lock thing, all this discussion about which way to go
turns out to be entirely premature.

            Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-20 20:20                     ` Davidlohr Bueso
@ 2014-03-21  4:55                       ` Srikar Dronamraju
  -1 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-21  4:55 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Linus Torvalds, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

> 
> Ok, so a big reason why this patch doesn't apply cleanly after reverting
> is because *most* of the changes were done at the top of the file with
> regards to documenting the ordering guarantees, the actual code changes
> are quite minimal.
> 
> I reverted commits 99b60ce6 (documentation) and b0c29f79 (the offending
> commit), and then I cleanly applied the equivalent ones from v3 of the
> series (which was already *tested* and ready for upstream until you
> suggested looking into the alternative spinlock approach):
> 
> https://lkml.org/lkml/2013/12/19/624
> https://lkml.org/lkml/2013/12/19/630

I reverted commits 99b60ce6 and b0c29f79. Then applied the patches in
the above url. The last one had a reject but it was pretty
straightforward to resolve it. After this, specjbb completes. 

So reverting and applying v3 3/4 and 4/4 patches works for me.

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-21  4:55                       ` Srikar Dronamraju
  0 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-21  4:55 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Peter Zijlstra, Linus Torvalds, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

> 
> Ok, so a big reason why this patch doesn't apply cleanly after reverting
> is because *most* of the changes were done at the top of the file with
> regards to documenting the ordering guarantees, the actual code changes
> are quite minimal.
> 
> I reverted commits 99b60ce6 (documentation) and b0c29f79 (the offending
> commit), and then I cleanly applied the equivalent ones from v3 of the
> series (which was already *tested* and ready for upstream until you
> suggested looking into the alternative spinlock approach):
> 
> https://lkml.org/lkml/2013/12/19/624
> https://lkml.org/lkml/2013/12/19/630

I reverted commits 99b60ce6 and b0c29f79. Then applied the patches in
the above url. The last one had a reject but it was pretty
straightforward to resolve it. After this, specjbb completes. 

So reverting and applying v3 3/4 and 4/4 patches works for me.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-21  4:55                       ` Srikar Dronamraju
@ 2014-03-21  5:24                         ` Linus Torvalds
  -1 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-21  5:24 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Davidlohr Bueso, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Thu, Mar 20, 2014 at 9:55 PM, Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>
> I reverted commits 99b60ce6 and b0c29f79. Then applied the patches in
> the above url. The last one had a reject but it was pretty
> straightforward to resolve it. After this, specjbb completes.
>
> So reverting and applying v3 3/4 and 4/4 patches works for me.

Ok, I verified that the above endds up resulting in the same tree as
the minimal patch I sent out, modulo (a) some comments and (b) an
#ifdef CONFIG_SMP in futex_get_mm() that doesn't really matter.

So I committed the minimal patch with your tested-by.

             Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-21  5:24                         ` Linus Torvalds
  0 siblings, 0 replies; 54+ messages in thread
From: Linus Torvalds @ 2014-03-21  5:24 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, LKML, Davidlohr Bueso, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Thu, Mar 20, 2014 at 9:55 PM, Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
>
> I reverted commits 99b60ce6 and b0c29f79. Then applied the patches in
> the above url. The last one had a reject but it was pretty
> straightforward to resolve it. After this, specjbb completes.
>
> So reverting and applying v3 3/4 and 4/4 patches works for me.

Ok, I verified that the above endds up resulting in the same tree as
the minimal patch I sent out, modulo (a) some comments and (b) an
#ifdef CONFIG_SMP in futex_get_mm() that doesn't really matter.

So I committed the minimal patch with your tested-by.

             Linus

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-21  5:24                         ` Linus Torvalds
@ 2014-03-22  2:27                           ` Srikar Dronamraju
  -1 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-22  2:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney


> > So reverting and applying v3 3/4 and 4/4 patches works for me.
> 
> Ok, I verified that the above endds up resulting in the same tree as
> the minimal patch I sent out, modulo (a) some comments and (b) an
> #ifdef CONFIG_SMP in futex_get_mm() that doesn't really matter.
> 
> So I committed the minimal patch with your tested-by.
> 

Just to be sure, I have verified with latest mainline with HEAD having
commit 08edb33c4 (Merge branch 'drm-fixes' of
git://people.freedesktop.org/~airlied/linux) which also has the commit
11d4616bd0 futex:( revert back to the explicit waiter counting code).

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-22  2:27                           ` Srikar Dronamraju
  0 siblings, 0 replies; 54+ messages in thread
From: Srikar Dronamraju @ 2014-03-22  2:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Peter Zijlstra, LKML, Davidlohr Bueso, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar


> > So reverting and applying v3 3/4 and 4/4 patches works for me.
> 
> Ok, I verified that the above endds up resulting in the same tree as
> the minimal patch I sent out, modulo (a) some comments and (b) an
> #ifdef CONFIG_SMP in futex_get_mm() that doesn't really matter.
> 
> So I committed the minimal patch with your tested-by.
> 

Just to be sure, I have verified with latest mainline with HEAD having
commit 08edb33c4 (Merge branch 'drm-fixes' of
git://people.freedesktop.org/~airlied/linux) which also has the commit
11d4616bd0 futex:( revert back to the explicit waiter counting code).

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
  2014-03-22  2:27                           ` Srikar Dronamraju
@ 2014-03-22  3:36                             ` Davidlohr Bueso
  -1 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-22  3:36 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Linus Torvalds, Peter Zijlstra, Thomas Gleixner, Ingo Molnar,
	LKML, ppc-dev, Benjamin Herrenschmidt, Paul Mackerras,
	Paul McKenney

On Sat, 2014-03-22 at 07:57 +0530, Srikar Dronamraju wrote:
> > > So reverting and applying v3 3/4 and 4/4 patches works for me.
> > 
> > Ok, I verified that the above endds up resulting in the same tree as
> > the minimal patch I sent out, modulo (a) some comments and (b) an
> > #ifdef CONFIG_SMP in futex_get_mm() that doesn't really matter.
> > 
> > So I committed the minimal patch with your tested-by.
> > 
> 
> Just to be sure, I have verified with latest mainline with HEAD having
> commit 08edb33c4 (Merge branch 'drm-fixes' of
> git://people.freedesktop.org/~airlied/linux) which also has the commit
> 11d4616bd0 futex:( revert back to the explicit waiter counting code).

Thanks for double checking.


^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: Tasks stuck in futex code (in 3.14-rc6)
@ 2014-03-22  3:36                             ` Davidlohr Bueso
  0 siblings, 0 replies; 54+ messages in thread
From: Davidlohr Bueso @ 2014-03-22  3:36 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Peter Zijlstra, Linus Torvalds, LKML, Paul Mackerras,
	Thomas Gleixner, Paul McKenney, ppc-dev, Ingo Molnar

On Sat, 2014-03-22 at 07:57 +0530, Srikar Dronamraju wrote:
> > > So reverting and applying v3 3/4 and 4/4 patches works for me.
> > 
> > Ok, I verified that the above endds up resulting in the same tree as
> > the minimal patch I sent out, modulo (a) some comments and (b) an
> > #ifdef CONFIG_SMP in futex_get_mm() that doesn't really matter.
> > 
> > So I committed the minimal patch with your tested-by.
> > 
> 
> Just to be sure, I have verified with latest mainline with HEAD having
> commit 08edb33c4 (Merge branch 'drm-fixes' of
> git://people.freedesktop.org/~airlied/linux) which also has the commit
> 11d4616bd0 futex:( revert back to the explicit waiter counting code).

Thanks for double checking.

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2014-03-22  3:36 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-19 15:26 Tasks stuck in futex code (in 3.14-rc6) Srikar Dronamraju
2014-03-19 15:26 ` Srikar Dronamraju
2014-03-19 15:47 ` Peter Zijlstra
2014-03-19 15:47   ` Peter Zijlstra
2014-03-19 16:09   ` Srikar Dronamraju
2014-03-19 16:09     ` Srikar Dronamraju
2014-03-19 17:08   ` Peter Zijlstra
2014-03-19 17:08     ` Peter Zijlstra
2014-03-19 18:06     ` Davidlohr Bueso
2014-03-19 18:06       ` Davidlohr Bueso
2014-03-20  5:33     ` Srikar Dronamraju
2014-03-20  5:33       ` Srikar Dronamraju
2014-03-20  5:56       ` Davidlohr Bueso
2014-03-20  5:56         ` Davidlohr Bueso
2014-03-20 10:08         ` Srikar Dronamraju
2014-03-20 10:08           ` Srikar Dronamraju
2014-03-20 15:06           ` Davidlohr Bueso
2014-03-20 15:06             ` Davidlohr Bueso
2014-03-20 16:31         ` Davidlohr Bueso
2014-03-20 16:31           ` Davidlohr Bueso
2014-03-20 20:23           ` Benjamin Herrenschmidt
2014-03-20 20:23             ` Benjamin Herrenschmidt
2014-03-20 16:41         ` Linus Torvalds
2014-03-20 16:41           ` Linus Torvalds
2014-03-20 17:18           ` Davidlohr Bueso
2014-03-20 17:18             ` Davidlohr Bueso
2014-03-20 17:42             ` Linus Torvalds
2014-03-20 17:42               ` Linus Torvalds
2014-03-20 18:03               ` Davidlohr Bueso
2014-03-20 18:03                 ` Davidlohr Bueso
2014-03-20 18:16                 ` Linus Torvalds
2014-03-20 18:16                   ` Linus Torvalds
2014-03-20 18:36             ` Linus Torvalds
2014-03-20 18:36               ` Linus Torvalds
2014-03-20 19:08               ` Davidlohr Bueso
2014-03-20 19:08                 ` Davidlohr Bueso
2014-03-20 19:25                 ` Linus Torvalds
2014-03-20 19:25                   ` Linus Torvalds
2014-03-20 20:20                   ` Davidlohr Bueso
2014-03-20 20:20                     ` Davidlohr Bueso
2014-03-20 20:36                     ` Linus Torvalds
2014-03-20 20:36                       ` Linus Torvalds
2014-03-21  4:55                     ` Srikar Dronamraju
2014-03-21  4:55                       ` Srikar Dronamraju
2014-03-21  5:24                       ` Linus Torvalds
2014-03-21  5:24                         ` Linus Torvalds
2014-03-22  2:27                         ` Srikar Dronamraju
2014-03-22  2:27                           ` Srikar Dronamraju
2014-03-22  3:36                           ` Davidlohr Bueso
2014-03-22  3:36                             ` Davidlohr Bueso
2014-03-20  7:23       ` Peter Zijlstra
2014-03-20  7:23         ` Peter Zijlstra
2014-03-19 16:04 ` Linus Torvalds
2014-03-19 16:04   ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.