All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/4] rtmutex: Plug unlock vs. requeue race
@ 2016-11-30 21:04 Thomas Gleixner
  2016-11-30 21:04 ` [patch 1/4] rtmutex: Prevent dequeue vs. unlock race Thomas Gleixner
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Thomas Gleixner @ 2016-11-30 21:04 UTC (permalink / raw)
  To: LKML
  Cc: David Daney, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Sebastian Siewior, Will Deacon, Mark Rutland

The following series plugs a subtle race and robustifies the code
further. Aside of that it adds commentry about lockless operations and
removes a confusing extra define.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 1/4] rtmutex: Prevent dequeue vs. unlock race
  2016-11-30 21:04 [patch 0/4] rtmutex: Plug unlock vs. requeue race Thomas Gleixner
@ 2016-11-30 21:04 ` Thomas Gleixner
  2016-12-01 17:56   ` David Daney
                     ` (3 more replies)
  2016-11-30 21:04 ` [patch 2/4] rtmutex: Use READ_ONCE() in rt_mutex_owner() Thomas Gleixner
                   ` (3 subsequent siblings)
  4 siblings, 4 replies; 14+ messages in thread
From: Thomas Gleixner @ 2016-11-30 21:04 UTC (permalink / raw)
  To: LKML
  Cc: David Daney, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Sebastian Siewior, Will Deacon, Mark Rutland, stable

[-- Attachment #1: rtmutex--Prevent-dequeue-unlock-race.patch --]
[-- Type: text/plain, Size: 4941 bytes --]

David reported a futex/rtmutex state corruption. It's caused by the
following problem:

CPU0		CPU1		CPU2

l->owner=T1
		rt_mutex_lock(l)
		lock(l->wait_lock)
		l->owner = T1 | HAS_WAITERS;
		enqueue(T2)
		boost()
		  unlock(l->wait_lock)
		schedule()

				rt_mutex_lock(l)
				lock(l->wait_lock)
				l->owner = T1 | HAS_WAITERS;
				enqueue(T3)
				boost()
				  unlock(l->wait_lock)
				schedule()
		signal(->T2)	signal(->T3)
		lock(l->wait_lock)
		dequeue(T2)
		deboost()
		  unlock(l->wait_lock)
				lock(l->wait_lock)
				dequeue(T3)
				  ===> wait list is now empty
				deboost()
				 unlock(l->wait_lock)
		lock(l->wait_lock)
		fixup_rt_mutex_waiters()
		  if (wait_list_empty(l)) {
		    owner = l->owner & ~HAS_WAITERS;
 		    l->owner = owner
		     ==> l->owner = T1
		  }

				lock(l->wait_lock)
rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
				  if (wait_list_empty(l)) {
				    owner = l->owner & ~HAS_WAITERS;
cmpxchg(l->owner, T1, NULL)
 ===> Success (l->owner = NULL)
				    l->owner = owner
				     ==> l->owner = T1
				  }

That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.

This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().

The solution is rather trivial: Verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.

It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x8664, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.

Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.

Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Reported-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
---
 kernel/locking/rtmutex.c |   68 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 66 insertions(+), 2 deletions(-)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -65,8 +65,72 @@ static inline void clear_rt_mutex_waiter
 
 static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
 {
-	if (!rt_mutex_has_waiters(lock))
-		clear_rt_mutex_waiters(lock);
+	unsigned long owner, *p = (unsigned long *) &lock->owner;
+
+	if (rt_mutex_has_waiters(lock))
+		return;
+
+	/*
+	 * The rbtree has no waiters enqueued, now make sure that the
+	 * lock->owner still has the waiters bit set, otherwise the
+	 * following can happen:
+	 *
+	 * CPU 0	CPU 1		CPU2
+	 * l->owner=T1
+	 *		rt_mutex_lock(l)
+	 *		lock(l->lock)
+	 *		l->owner = T1 | HAS_WAITERS;
+	 *		enqueue(T2)
+	 *		boost()
+	 *		  unlock(l->lock)
+	 *		block()
+	 *
+	 *				rt_mutex_lock(l)
+	 *				lock(l->lock)
+	 *				l->owner = T1 | HAS_WAITERS;
+	 *				enqueue(T3)
+	 *				boost()
+	 *				  unlock(l->lock)
+	 *				block()
+	 *		signal(->T2)	signal(->T3)
+	 *		lock(l->lock)
+	 *		dequeue(T2)
+	 *		deboost()
+	 *		  unlock(l->lock)
+	 *				lock(l->lock)
+	 *				dequeue(T3)
+	 *				 ==> wait list is empty
+	 *				deboost()
+	 *				 unlock(l->lock)
+	 *		lock(l->lock)
+	 *		fixup_rt_mutex_waiters()
+	 *		  if (wait_list_empty(l) {
+	 *		    l->owner = owner
+	 *		    owner = l->owner & ~HAS_WAITERS;
+	 *		      ==> l->owner = T1
+	 *		  }
+	 *				lock(l->lock)
+	 * rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
+	 *				  if (wait_list_empty(l) {
+	 *				    owner = l->owner & ~HAS_WAITERS;
+	 * cmpxchg(l->owner, T1, NULL)
+	 *  ===> Success (l->owner = NULL)
+	 *
+	 *				    l->owner = owner
+	 *				      ==> l->owner = T1
+	 *				  }
+	 *
+	 * With the check for the waiter bit in place T3 on CPU2 will not
+	 * overwrite. All tasks fiddling with the waiters bit are
+	 * serialized by l->lock, so nothing else can modify the waiters
+	 * bit. If the bit is set then nothing can change l->owner either
+	 * so the simple RMW is safe. The cmpxchg() will simply fail if it
+	 * happens in the middle of the RMW because the waiters bit is
+	 * still set.
+	 */
+	owner = READ_ONCE(*p);
+	if (owner & RT_MUTEX_HAS_WAITERS)
+		WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
 }
 
 /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 2/4] rtmutex: Use READ_ONCE() in rt_mutex_owner()
  2016-11-30 21:04 [patch 0/4] rtmutex: Plug unlock vs. requeue race Thomas Gleixner
  2016-11-30 21:04 ` [patch 1/4] rtmutex: Prevent dequeue vs. unlock race Thomas Gleixner
@ 2016-11-30 21:04 ` Thomas Gleixner
  2016-12-02 10:45   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
  2016-11-30 21:04 ` [patch 3/4] rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL Thomas Gleixner
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2016-11-30 21:04 UTC (permalink / raw)
  To: LKML
  Cc: David Daney, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Sebastian Siewior, Will Deacon, Mark Rutland

[-- Attachment #1: rtmutex--Use-READ_ONCE---in-rt_mutex_owner--.patch --]
[-- Type: text/plain, Size: 1010 bytes --]

While debugging the rtmutex unlock vs. dequeue race Will suggested to use
READ_ONCE() in rt_mutex_owner() as it might race against the
cmpxchg_release() in unlock_rt_mutex_safe().

Will: "It's a minor thing which will most likely not matter in practice"

Careful search did not unearth an actual problem in todays code, but it's
better to be safe than surprised.

Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/locking/rtmutex_common.h |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -75,8 +75,9 @@ task_top_pi_waiter(struct task_struct *p
 
 static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
 {
-	return (struct task_struct *)
-		((unsigned long)lock->owner & ~RT_MUTEX_OWNER_MASKALL);
+	unsigned long owner = (unsigned long) READ_ONCE(lock->owner);
+
+	return (struct task_struct *) (owner & ~RT_MUTEX_OWNER_MASKALL);
 }
 
 /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 3/4] rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL
  2016-11-30 21:04 [patch 0/4] rtmutex: Plug unlock vs. requeue race Thomas Gleixner
  2016-11-30 21:04 ` [patch 1/4] rtmutex: Prevent dequeue vs. unlock race Thomas Gleixner
  2016-11-30 21:04 ` [patch 2/4] rtmutex: Use READ_ONCE() in rt_mutex_owner() Thomas Gleixner
@ 2016-11-30 21:04 ` Thomas Gleixner
  2016-12-02 10:46   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
  2016-11-30 21:04 ` [patch 4/4] rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() Thomas Gleixner
  2016-12-01 18:33 ` [patch 0/4] rtmutex: Plug unlock vs. requeue race Peter Zijlstra
  4 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2016-11-30 21:04 UTC (permalink / raw)
  To: LKML
  Cc: David Daney, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Sebastian Siewior, Will Deacon, Mark Rutland

[-- Attachment #1: rtmutex--Get-rid-of-RT_MUTEX_OWNER_MASKALL.patch --]
[-- Type: text/plain, Size: 1031 bytes --]

This is a left over from the original rtmutex implementation which used
both bit0 and bit1 in the owner pointer. Commit 8161239a8bcc ("rtmutex:
Simplify PI algorithm and make highest prio task get lock") removed the
usage of bit1, but kept the extra mask around. This is confusing at best.

Remove it and just use RT_MUTEX_HAS_WAITERS for the masking.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/locking/rtmutex_common.h |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -71,13 +71,12 @@ task_top_pi_waiter(struct task_struct *p
  * lock->owner state tracking:
  */
 #define RT_MUTEX_HAS_WAITERS	1UL
-#define RT_MUTEX_OWNER_MASKALL	1UL
 
 static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
 {
 	unsigned long owner = (unsigned long) READ_ONCE(lock->owner);
 
-	return (struct task_struct *) (owner & ~RT_MUTEX_OWNER_MASKALL);
+	return (struct task_struct *) (owner & ~RT_MUTEX_HAS_WAITERS);
 }
 
 /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [patch 4/4] rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked()
  2016-11-30 21:04 [patch 0/4] rtmutex: Plug unlock vs. requeue race Thomas Gleixner
                   ` (2 preceding siblings ...)
  2016-11-30 21:04 ` [patch 3/4] rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL Thomas Gleixner
@ 2016-11-30 21:04 ` Thomas Gleixner
  2016-12-02 10:46   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
  2016-12-01 18:33 ` [patch 0/4] rtmutex: Plug unlock vs. requeue race Peter Zijlstra
  4 siblings, 1 reply; 14+ messages in thread
From: Thomas Gleixner @ 2016-11-30 21:04 UTC (permalink / raw)
  To: LKML
  Cc: David Daney, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Sebastian Siewior, Will Deacon, Mark Rutland

[-- Attachment #1: rtmutex--Explain-locking-rules-for-rt_mutex_proxy_unlock.patch --]
[-- Type: text/plain, Size: 1946 bytes --]

While debugging the unlock vs. dequeue race which resulted in state
corruption of futexes the lockless nature of rt_mutex_proxy_unlock()
caused some confusion.

Add commentry to explain why it is safe to do this lockless. Add matching
comments to rt_mutex_init_proxy_locked() for completeness sake.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/locking/rtmutex.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1617,11 +1617,15 @@ EXPORT_SYMBOL_GPL(__rt_mutex_init);
  * rt_mutex_init_proxy_locked - initialize and lock a rt_mutex on behalf of a
  *				proxy owner
  *
- * @lock: 	the rt_mutex to be locked
+ * @lock:	the rt_mutex to be locked
  * @proxy_owner:the task to set as owner
  *
  * No locking. Caller has to do serializing itself
- * Special API call for PI-futex support
+ *
+ * Special API call for PI-futex support. This initializes the rtmutex and
+ * assigns it to @proxy_owner. Concurrent operations on the rtmutex are not
+ * possible at this point because the pi_state which contains the rtmutex
+ * is not yet visible to other tasks.
  */
 void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
 				struct task_struct *proxy_owner)
@@ -1635,10 +1639,14 @@ void rt_mutex_init_proxy_locked(struct r
 /**
  * rt_mutex_proxy_unlock - release a lock on behalf of owner
  *
- * @lock: 	the rt_mutex to be locked
+ * @lock:	the rt_mutex to be locked
  *
  * No locking. Caller has to do serializing itself
- * Special API call for PI-futex support
+ *
+ * Special API call for PI-futex support. This merily cleans up the rtmutex
+ * (debugging) state. Concurrent operations on this rt_mutex are not
+ * possible because it belongs to the pi_state which is about to be freed
+ * and it is not longer visible to other tasks.
  */
 void rt_mutex_proxy_unlock(struct rt_mutex *lock,
 			   struct task_struct *proxy_owner)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 1/4] rtmutex: Prevent dequeue vs. unlock race
  2016-11-30 21:04 ` [patch 1/4] rtmutex: Prevent dequeue vs. unlock race Thomas Gleixner
@ 2016-12-01 17:56   ` David Daney
  2016-12-01 18:25   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: David Daney @ 2016-12-01 17:56 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, David Daney, Peter Zijlstra, Ingo Molnar, Steven Rostedt,
	Sebastian Siewior, Will Deacon, Mark Rutland, stable

On 11/30/2016 01:04 PM, Thomas Gleixner wrote:
> David reported a futex/rtmutex state corruption. It's caused by the
> following problem:
>
> CPU0		CPU1		CPU2
>
> l->owner=T1
> 		rt_mutex_lock(l)
> 		lock(l->wait_lock)
> 		l->owner = T1 | HAS_WAITERS;
> 		enqueue(T2)
> 		boost()
> 		  unlock(l->wait_lock)
> 		schedule()
>
> 				rt_mutex_lock(l)
> 				lock(l->wait_lock)
> 				l->owner = T1 | HAS_WAITERS;
> 				enqueue(T3)
> 				boost()
> 				  unlock(l->wait_lock)
> 				schedule()
> 		signal(->T2)	signal(->T3)
> 		lock(l->wait_lock)
> 		dequeue(T2)
> 		deboost()
> 		  unlock(l->wait_lock)
> 				lock(l->wait_lock)
> 				dequeue(T3)
> 				  ===> wait list is now empty
> 				deboost()
> 				 unlock(l->wait_lock)
> 		lock(l->wait_lock)
> 		fixup_rt_mutex_waiters()
> 		  if (wait_list_empty(l)) {
> 		    owner = l->owner & ~HAS_WAITERS;
>   		    l->owner = owner
> 		     ==> l->owner = T1
> 		  }
>
> 				lock(l->wait_lock)
> rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
> 				  if (wait_list_empty(l)) {
> 				    owner = l->owner & ~HAS_WAITERS;
> cmpxchg(l->owner, T1, NULL)
>   ===> Success (l->owner = NULL)
> 				    l->owner = owner
> 				     ==> l->owner = T1
> 				  }
>
> That means the problem is caused by fixup_rt_mutex_waiters() which does the
> RMW to clear the waiters bit unconditionally when there are no waiters in
> the rtmutexes rbtree.
>
> This can be fatal: A concurrent unlock can release the rtmutex in the
> fastpath because the waiters bit is not set. If the cmpxchg() gets in the
> middle of the RMW operation then the previous owner, which just unlocked
> the rtmutex is set as the owner again when the write takes place after the
> successfull cmpxchg().
>
> The solution is rather trivial: Verify that the owner member of the rtmutex
> has the waiters bit set before clearing it. This does not require a
> cmpxchg() or other atomic operations because the waiters bit can only be
> set and cleared with the rtmutex wait_lock held. It's also safe against the
> fast path unlock attempt. The unlock attempt via cmpxchg() will either see
> the bit set and take the slowpath or see the bit cleared and release it
> atomically in the fastpath.
>
> It's remarkable that the test program provided by David triggers on ARM64
> and MIPS64 really quick, but it refuses to reproduce on x8664, while the
> problem exists there as well. That refusal might explain that this got not
> discovered earlier despite the bug existing from day one of the rtmutex
> implementation more than 10 years ago.
>
> Thanks to David for meticulously instrumenting the code and providing the
> information which allowed to decode this subtle problem.
>
> Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
> Reported-by: David Daney<ddaney@caviumnetworks.com>
> Signed-off-by: Thomas Gleixner<tglx@linutronix.de>
> Cc:stable@vger.kernel.org

FWIW:

Tested-by: David Daney <david.daney@cavium.com>

... on arm64 and mips64 where it fixes the failures we were seeing.

Thanks to Thomas for taking the time to work through this thing.

David Daney



> ---
>   kernel/locking/rtmutex.c |   68 +++++++++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 66 insertions(+), 2 deletions(-)
>
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -65,8 +65,72 @@ static inline void clear_rt_mutex_waiter
>
>   static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
>   {
> -	if (!rt_mutex_has_waiters(lock))
> -		clear_rt_mutex_waiters(lock);
> +	unsigned long owner, *p = (unsigned long *) &lock->owner;
> +
> +	if (rt_mutex_has_waiters(lock))
> +		return;
> +
> +	/*
> +	 * The rbtree has no waiters enqueued, now make sure that the
> +	 * lock->owner still has the waiters bit set, otherwise the
> +	 * following can happen:
> +	 *
> +	 * CPU 0	CPU 1		CPU2
> +	 * l->owner=T1
> +	 *		rt_mutex_lock(l)
> +	 *		lock(l->lock)
> +	 *		l->owner = T1 | HAS_WAITERS;
> +	 *		enqueue(T2)
> +	 *		boost()
> +	 *		  unlock(l->lock)
> +	 *		block()
> +	 *
> +	 *				rt_mutex_lock(l)
> +	 *				lock(l->lock)
> +	 *				l->owner = T1 | HAS_WAITERS;
> +	 *				enqueue(T3)
> +	 *				boost()
> +	 *				  unlock(l->lock)
> +	 *				block()
> +	 *		signal(->T2)	signal(->T3)
> +	 *		lock(l->lock)
> +	 *		dequeue(T2)
> +	 *		deboost()
> +	 *		  unlock(l->lock)
> +	 *				lock(l->lock)
> +	 *				dequeue(T3)
> +	 *				 ==> wait list is empty
> +	 *				deboost()
> +	 *				 unlock(l->lock)
> +	 *		lock(l->lock)
> +	 *		fixup_rt_mutex_waiters()
> +	 *		  if (wait_list_empty(l) {
> +	 *		    l->owner = owner
> +	 *		    owner = l->owner & ~HAS_WAITERS;
> +	 *		      ==> l->owner = T1
> +	 *		  }
> +	 *				lock(l->lock)
> +	 * rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
> +	 *				  if (wait_list_empty(l) {
> +	 *				    owner = l->owner & ~HAS_WAITERS;
> +	 * cmpxchg(l->owner, T1, NULL)
> +	 *  ===> Success (l->owner = NULL)
> +	 *
> +	 *				    l->owner = owner
> +	 *				      ==> l->owner = T1
> +	 *				  }
> +	 *
> +	 * With the check for the waiter bit in place T3 on CPU2 will not
> +	 * overwrite. All tasks fiddling with the waiters bit are
> +	 * serialized by l->lock, so nothing else can modify the waiters
> +	 * bit. If the bit is set then nothing can change l->owner either
> +	 * so the simple RMW is safe. The cmpxchg() will simply fail if it
> +	 * happens in the middle of the RMW because the waiters bit is
> +	 * still set.
> +	 */
> +	owner = READ_ONCE(*p);
> +	if (owner & RT_MUTEX_HAS_WAITERS)
> +		WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
>   }
>
>   /*
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 1/4] rtmutex: Prevent dequeue vs. unlock race
  2016-11-30 21:04 ` [patch 1/4] rtmutex: Prevent dequeue vs. unlock race Thomas Gleixner
  2016-12-01 17:56   ` David Daney
@ 2016-12-01 18:25   ` Peter Zijlstra
  2016-12-02  8:18     ` Thomas Gleixner
  2016-12-02  0:53   ` Steven Rostedt
  2016-12-02 10:45   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
  3 siblings, 1 reply; 14+ messages in thread
From: Peter Zijlstra @ 2016-12-01 18:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, David Daney, Ingo Molnar, Steven Rostedt,
	Sebastian Siewior, Will Deacon, Mark Rutland, stable

On Wed, Nov 30, 2016 at 09:04:41PM -0000, Thomas Gleixner wrote:
> It's remarkable that the test program provided by David triggers on ARM64
> and MIPS64 really quick, but it refuses to reproduce on x8664, while the
> problem exists there as well. That refusal might explain that this got not
> discovered earlier despite the bug existing from day one of the rtmutex
> implementation more than 10 years ago.

> -		clear_rt_mutex_waiters(lock);

So that compiles into:

	andq   $0xfffffffffffffffe,0x48(%rbx)

With is a RmW memop. Now per the architecture documents we can decompose
that into a normal load-store and the race exists. But I would not be
surprised if that starts with the cacheline in exclusive mode (because
it knows it will do the store). Which makes it a very tiny race indeed.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 0/4] rtmutex: Plug unlock vs. requeue race
  2016-11-30 21:04 [patch 0/4] rtmutex: Plug unlock vs. requeue race Thomas Gleixner
                   ` (3 preceding siblings ...)
  2016-11-30 21:04 ` [patch 4/4] rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() Thomas Gleixner
@ 2016-12-01 18:33 ` Peter Zijlstra
  4 siblings, 0 replies; 14+ messages in thread
From: Peter Zijlstra @ 2016-12-01 18:33 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, David Daney, Ingo Molnar, Steven Rostedt,
	Sebastian Siewior, Will Deacon, Mark Rutland

On Wed, Nov 30, 2016 at 09:04:40PM -0000, Thomas Gleixner wrote:
> The following series plugs a subtle race and robustifies the code
> further. Aside of that it adds commentry about lockless operations and
> removes a confusing extra define.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 1/4] rtmutex: Prevent dequeue vs. unlock race
  2016-11-30 21:04 ` [patch 1/4] rtmutex: Prevent dequeue vs. unlock race Thomas Gleixner
  2016-12-01 17:56   ` David Daney
  2016-12-01 18:25   ` Peter Zijlstra
@ 2016-12-02  0:53   ` Steven Rostedt
  2016-12-02 10:45   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 14+ messages in thread
From: Steven Rostedt @ 2016-12-02  0:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, David Daney, Peter Zijlstra, Ingo Molnar,
	Sebastian Siewior, Will Deacon, Mark Rutland, stable

On Wed, 30 Nov 2016 21:04:41 -0000
Thomas Gleixner <tglx@linutronix.de> wrote:

> David reported a futex/rtmutex state corruption. It's caused by the
> following problem:
> 
> CPU0		CPU1		CPU2
> 
> l->owner=T1
> 		rt_mutex_lock(l)
> 		lock(l->wait_lock)
> 		l->owner = T1 | HAS_WAITERS;
> 		enqueue(T2)
> 		boost()
> 		  unlock(l->wait_lock)
> 		schedule()
> 
> 				rt_mutex_lock(l)
> 				lock(l->wait_lock)
> 				l->owner = T1 | HAS_WAITERS;
> 				enqueue(T3)
> 				boost()
> 				  unlock(l->wait_lock)
> 				schedule()
> 		signal(->T2)	signal(->T3)
> 		lock(l->wait_lock)
> 		dequeue(T2)
> 		deboost()
> 		  unlock(l->wait_lock)
> 				lock(l->wait_lock)
> 				dequeue(T3)
> 				  ===> wait list is now empty  
> 				deboost()
> 				 unlock(l->wait_lock)
> 		lock(l->wait_lock)
> 		fixup_rt_mutex_waiters()
> 		  if (wait_list_empty(l)) {
> 		    owner = l->owner & ~HAS_WAITERS;
>  		    l->owner = owner
> 		     ==> l->owner = T1  
> 		  }
> 
> 				lock(l->wait_lock)
> rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
> 				  if (wait_list_empty(l)) {
> 				    owner = l->owner & ~HAS_WAITERS;
> cmpxchg(l->owner, T1, NULL)
>  ===> Success (l->owner = NULL)  
> 				    l->owner = owner
> 				     ==> l->owner = T1  
> 				  }
> 
> That means the problem is caused by fixup_rt_mutex_waiters() which does the
> RMW to clear the waiters bit unconditionally when there are no waiters in
> the rtmutexes rbtree.
> 
> This can be fatal: A concurrent unlock can release the rtmutex in the
> fastpath because the waiters bit is not set. If the cmpxchg() gets in the
> middle of the RMW operation then the previous owner, which just unlocked
> the rtmutex is set as the owner again when the write takes place after the
> successfull cmpxchg().
> 
> The solution is rather trivial: Verify that the owner member of the rtmutex
> has the waiters bit set before clearing it. This does not require a
> cmpxchg() or other atomic operations because the waiters bit can only be
> set and cleared with the rtmutex wait_lock held. It's also safe against the
> fast path unlock attempt. The unlock attempt via cmpxchg() will either see
> the bit set and take the slowpath or see the bit cleared and release it
> atomically in the fastpath.
> 
> It's remarkable that the test program provided by David triggers on ARM64
> and MIPS64 really quick, but it refuses to reproduce on x8664, while the
> problem exists there as well. That refusal might explain that this got not
> discovered earlier despite the bug existing from day one of the rtmutex
> implementation more than 10 years ago.

Because x86 is awesome! ;-)

> 
> Thanks to David for meticulously instrumenting the code and providing the
> information which allowed to decode this subtle problem.
> 
> Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
> Reported-by: David Daney <ddaney@caviumnetworks.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
>  kernel/locking/rtmutex.c |   68 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 66 insertions(+), 2 deletions(-)
> 
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -65,8 +65,72 @@ static inline void clear_rt_mutex_waiter
>  
>  static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
>  {
> -	if (!rt_mutex_has_waiters(lock))
> -		clear_rt_mutex_waiters(lock);

Hmm, now that clear_rt_mutex_waiters() has only one user, but luckily
it's done in the slow unlock case where the wait lock is held and its
the owner doing the update. Perhaps that function should go away, and
just open code it in the one use case. Because it's part of the danger
that happened here, and we don't want it used outside of an unlock.

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve


> +	unsigned long owner, *p = (unsigned long *) &lock->owner;
> +
> +	if (rt_mutex_has_waiters(lock))
> +		return;
> +
> +	/*
> +	 * The rbtree has no waiters enqueued, now make sure that the
> +	 * lock->owner still has the waiters bit set, otherwise the
> +	 * following can happen:
> +	 *
> +	 * CPU 0	CPU 1		CPU2
> +	 * l->owner=T1
> +	 *		rt_mutex_lock(l)
> +	 *		lock(l->lock)
> +	 *		l->owner = T1 | HAS_WAITERS;
> +	 *		enqueue(T2)
> +	 *		boost()
> +	 *		  unlock(l->lock)
> +	 *		block()
> +	 *
> +	 *				rt_mutex_lock(l)
> +	 *				lock(l->lock)
> +	 *				l->owner = T1 | HAS_WAITERS;
> +	 *				enqueue(T3)
> +	 *				boost()
> +	 *				  unlock(l->lock)
> +	 *				block()
> +	 *		signal(->T2)	signal(->T3)
> +	 *		lock(l->lock)
> +	 *		dequeue(T2)
> +	 *		deboost()
> +	 *		  unlock(l->lock)
> +	 *				lock(l->lock)
> +	 *				dequeue(T3)
> +	 *				 ==> wait list is empty
> +	 *				deboost()
> +	 *				 unlock(l->lock)
> +	 *		lock(l->lock)
> +	 *		fixup_rt_mutex_waiters()
> +	 *		  if (wait_list_empty(l) {
> +	 *		    l->owner = owner
> +	 *		    owner = l->owner & ~HAS_WAITERS;
> +	 *		      ==> l->owner = T1
> +	 *		  }
> +	 *				lock(l->lock)
> +	 * rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
> +	 *				  if (wait_list_empty(l) {
> +	 *				    owner = l->owner & ~HAS_WAITERS;
> +	 * cmpxchg(l->owner, T1, NULL)
> +	 *  ===> Success (l->owner = NULL)
> +	 *
> +	 *				    l->owner = owner
> +	 *				      ==> l->owner = T1
> +	 *				  }
> +	 *
> +	 * With the check for the waiter bit in place T3 on CPU2 will not
> +	 * overwrite. All tasks fiddling with the waiters bit are
> +	 * serialized by l->lock, so nothing else can modify the waiters
> +	 * bit. If the bit is set then nothing can change l->owner either
> +	 * so the simple RMW is safe. The cmpxchg() will simply fail if it
> +	 * happens in the middle of the RMW because the waiters bit is
> +	 * still set.
> +	 */
> +	owner = READ_ONCE(*p);
> +	if (owner & RT_MUTEX_HAS_WAITERS)
> +		WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [patch 1/4] rtmutex: Prevent dequeue vs. unlock race
  2016-12-01 18:25   ` Peter Zijlstra
@ 2016-12-02  8:18     ` Thomas Gleixner
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Gleixner @ 2016-12-02  8:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, David Daney, Ingo Molnar, Steven Rostedt,
	Sebastian Siewior, Will Deacon, Mark Rutland, stable

On Thu, 1 Dec 2016, Peter Zijlstra wrote:

> On Wed, Nov 30, 2016 at 09:04:41PM -0000, Thomas Gleixner wrote:
> > It's remarkable that the test program provided by David triggers on ARM64
> > and MIPS64 really quick, but it refuses to reproduce on x8664, while the
> > problem exists there as well. That refusal might explain that this got not
> > discovered earlier despite the bug existing from day one of the rtmutex
> > implementation more than 10 years ago.
> 
> > -		clear_rt_mutex_waiters(lock);
> 
> So that compiles into:
> 
> 	andq   $0xfffffffffffffffe,0x48(%rbx)
> 
> With is a RmW memop. Now per the architecture documents we can decompose
> that into a normal load-store and the race exists. But I would not be
> surprised if that starts with the cacheline in exclusive mode (because
> it knows it will do the store). Which makes it a very tiny race indeed.

If it really takes the cacheline exclusive right away, then there is no
race because the cmpxchg has to wait for release and will see the store.
If the cmpxchg comes first the RmW will see the new value.

Fun stuff, isn't it?

	tglx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [tip:locking/core] locking/rtmutex: Prevent dequeue vs. unlock race
  2016-11-30 21:04 ` [patch 1/4] rtmutex: Prevent dequeue vs. unlock race Thomas Gleixner
                     ` (2 preceding siblings ...)
  2016-12-02  0:53   ` Steven Rostedt
@ 2016-12-02 10:45   ` tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 14+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-12-02 10:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: will.deacon, ddaney, tglx, hpa, peterz, bigeasy, linux-kernel,
	mingo, rostedt, torvalds, mark.rutland, david.daney

Commit-ID:  dbb26055defd03d59f678cb5f2c992abe05b064a
Gitweb:     http://git.kernel.org/tip/dbb26055defd03d59f678cb5f2c992abe05b064a
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 30 Nov 2016 21:04:41 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 2 Dec 2016 11:13:26 +0100

locking/rtmutex: Prevent dequeue vs. unlock race

David reported a futex/rtmutex state corruption. It's caused by the
following problem:

CPU0		CPU1		CPU2

l->owner=T1
		rt_mutex_lock(l)
		lock(l->wait_lock)
		l->owner = T1 | HAS_WAITERS;
		enqueue(T2)
		boost()
		  unlock(l->wait_lock)
		schedule()

				rt_mutex_lock(l)
				lock(l->wait_lock)
				l->owner = T1 | HAS_WAITERS;
				enqueue(T3)
				boost()
				  unlock(l->wait_lock)
				schedule()
		signal(->T2)	signal(->T3)
		lock(l->wait_lock)
		dequeue(T2)
		deboost()
		  unlock(l->wait_lock)
				lock(l->wait_lock)
				dequeue(T3)
				  ===> wait list is now empty
				deboost()
				 unlock(l->wait_lock)
		lock(l->wait_lock)
		fixup_rt_mutex_waiters()
		  if (wait_list_empty(l)) {
		    owner = l->owner & ~HAS_WAITERS;
		    l->owner = owner
		     ==> l->owner = T1
		  }

				lock(l->wait_lock)
rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
				  if (wait_list_empty(l)) {
				    owner = l->owner & ~HAS_WAITERS;
cmpxchg(l->owner, T1, NULL)
 ===> Success (l->owner = NULL)
				    l->owner = owner
				     ==> l->owner = T1
				  }

That means the problem is caused by fixup_rt_mutex_waiters() which does the
RMW to clear the waiters bit unconditionally when there are no waiters in
the rtmutexes rbtree.

This can be fatal: A concurrent unlock can release the rtmutex in the
fastpath because the waiters bit is not set. If the cmpxchg() gets in the
middle of the RMW operation then the previous owner, which just unlocked
the rtmutex is set as the owner again when the write takes place after the
successfull cmpxchg().

The solution is rather trivial: verify that the owner member of the rtmutex
has the waiters bit set before clearing it. This does not require a
cmpxchg() or other atomic operations because the waiters bit can only be
set and cleared with the rtmutex wait_lock held. It's also safe against the
fast path unlock attempt. The unlock attempt via cmpxchg() will either see
the bit set and take the slowpath or see the bit cleared and release it
atomically in the fastpath.

It's remarkable that the test program provided by David triggers on ARM64
and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
problem exists there as well. That refusal might explain that this got not
discovered earlier despite the bug existing from day one of the rtmutex
implementation more than 10 years ago.

Thanks to David for meticulously instrumenting the code and providing the
information which allowed to decode this subtle problem.

Reported-by: David Daney <ddaney@caviumnetworks.com>
Tested-by: David Daney <david.daney@cavium.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: stable@vger.kernel.org
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rtmutex.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 66 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 1ec0f48..2c49d76 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -65,8 +65,72 @@ static inline void clear_rt_mutex_waiters(struct rt_mutex *lock)
 
 static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
 {
-	if (!rt_mutex_has_waiters(lock))
-		clear_rt_mutex_waiters(lock);
+	unsigned long owner, *p = (unsigned long *) &lock->owner;
+
+	if (rt_mutex_has_waiters(lock))
+		return;
+
+	/*
+	 * The rbtree has no waiters enqueued, now make sure that the
+	 * lock->owner still has the waiters bit set, otherwise the
+	 * following can happen:
+	 *
+	 * CPU 0	CPU 1		CPU2
+	 * l->owner=T1
+	 *		rt_mutex_lock(l)
+	 *		lock(l->lock)
+	 *		l->owner = T1 | HAS_WAITERS;
+	 *		enqueue(T2)
+	 *		boost()
+	 *		  unlock(l->lock)
+	 *		block()
+	 *
+	 *				rt_mutex_lock(l)
+	 *				lock(l->lock)
+	 *				l->owner = T1 | HAS_WAITERS;
+	 *				enqueue(T3)
+	 *				boost()
+	 *				  unlock(l->lock)
+	 *				block()
+	 *		signal(->T2)	signal(->T3)
+	 *		lock(l->lock)
+	 *		dequeue(T2)
+	 *		deboost()
+	 *		  unlock(l->lock)
+	 *				lock(l->lock)
+	 *				dequeue(T3)
+	 *				 ==> wait list is empty
+	 *				deboost()
+	 *				 unlock(l->lock)
+	 *		lock(l->lock)
+	 *		fixup_rt_mutex_waiters()
+	 *		  if (wait_list_empty(l) {
+	 *		    l->owner = owner
+	 *		    owner = l->owner & ~HAS_WAITERS;
+	 *		      ==> l->owner = T1
+	 *		  }
+	 *				lock(l->lock)
+	 * rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
+	 *				  if (wait_list_empty(l) {
+	 *				    owner = l->owner & ~HAS_WAITERS;
+	 * cmpxchg(l->owner, T1, NULL)
+	 *  ===> Success (l->owner = NULL)
+	 *
+	 *				    l->owner = owner
+	 *				      ==> l->owner = T1
+	 *				  }
+	 *
+	 * With the check for the waiter bit in place T3 on CPU2 will not
+	 * overwrite. All tasks fiddling with the waiters bit are
+	 * serialized by l->lock, so nothing else can modify the waiters
+	 * bit. If the bit is set then nothing can change l->owner either
+	 * so the simple RMW is safe. The cmpxchg() will simply fail if it
+	 * happens in the middle of the RMW because the waiters bit is
+	 * still set.
+	 */
+	owner = READ_ONCE(*p);
+	if (owner & RT_MUTEX_HAS_WAITERS)
+		WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip:locking/core] locking/rtmutex: Use READ_ONCE() in rt_mutex_owner()
  2016-11-30 21:04 ` [patch 2/4] rtmutex: Use READ_ONCE() in rt_mutex_owner() Thomas Gleixner
@ 2016-12-02 10:45   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-12-02 10:45 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mark.rutland, rostedt, linux-kernel, mingo, ddaney, hpa, stable,
	torvalds, peterz, bigeasy, will.deacon, tglx

Commit-ID:  1be5d4fa0af34fb7bafa205aeb59f5c7cc7a089d
Gitweb:     http://git.kernel.org/tip/1be5d4fa0af34fb7bafa205aeb59f5c7cc7a089d
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 30 Nov 2016 21:04:42 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 2 Dec 2016 11:13:26 +0100

locking/rtmutex: Use READ_ONCE() in rt_mutex_owner()

While debugging the rtmutex unlock vs. dequeue race Will suggested to use
READ_ONCE() in rt_mutex_owner() as it might race against the
cmpxchg_release() in unlock_rt_mutex_safe().

Will: "It's a minor thing which will most likely not matter in practice"

Careful search did not unearth an actual problem in todays code, but it's
better to be safe than surprised.

Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rtmutex_common.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 4f5f83c..e317e1c 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -75,8 +75,9 @@ task_top_pi_waiter(struct task_struct *p)
 
 static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
 {
-	return (struct task_struct *)
-		((unsigned long)lock->owner & ~RT_MUTEX_OWNER_MASKALL);
+	unsigned long owner = (unsigned long) READ_ONCE(lock->owner);
+
+	return (struct task_struct *) (owner & ~RT_MUTEX_OWNER_MASKALL);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip:locking/core] locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL
  2016-11-30 21:04 ` [patch 3/4] rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL Thomas Gleixner
@ 2016-12-02 10:46   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-12-02 10:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, hpa, tglx, rostedt, mingo, ddaney, mark.rutland,
	peterz, linux-kernel, will.deacon, bigeasy

Commit-ID:  b5016e8203003c44264ec88fe2276ff54a51f689
Gitweb:     http://git.kernel.org/tip/b5016e8203003c44264ec88fe2276ff54a51f689
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 30 Nov 2016 21:04:44 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 2 Dec 2016 11:13:57 +0100

locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL

This is a left over from the original rtmutex implementation which used
both bit0 and bit1 in the owner pointer. Commit:

  8161239a8bcc ("rtmutex: Simplify PI algorithm and make highest prio task get lock")

... removed the usage of bit1, but kept the extra mask around. This is
confusing at best.

Remove it and just use RT_MUTEX_HAS_WAITERS for the masking.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20161130210030.509567906@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rtmutex_common.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index e317e1c..9901346 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -71,13 +71,12 @@ task_top_pi_waiter(struct task_struct *p)
  * lock->owner state tracking:
  */
 #define RT_MUTEX_HAS_WAITERS	1UL
-#define RT_MUTEX_OWNER_MASKALL	1UL
 
 static inline struct task_struct *rt_mutex_owner(struct rt_mutex *lock)
 {
 	unsigned long owner = (unsigned long) READ_ONCE(lock->owner);
 
-	return (struct task_struct *) (owner & ~RT_MUTEX_OWNER_MASKALL);
+	return (struct task_struct *) (owner & ~RT_MUTEX_HAS_WAITERS);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [tip:locking/core] locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked()
  2016-11-30 21:04 ` [patch 4/4] rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() Thomas Gleixner
@ 2016-12-02 10:46   ` tip-bot for Thomas Gleixner
  0 siblings, 0 replies; 14+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-12-02 10:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, linux-kernel, bigeasy, rostedt, torvalds, ddaney,
	mark.rutland, will.deacon, peterz, hpa, tglx

Commit-ID:  84d82ec5b9046ecdf16031d3e93a66ef50257402
Gitweb:     http://git.kernel.org/tip/84d82ec5b9046ecdf16031d3e93a66ef50257402
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Wed, 30 Nov 2016 21:04:45 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 2 Dec 2016 11:13:57 +0100

locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked()

While debugging the unlock vs. dequeue race which resulted in state
corruption of futexes the lockless nature of rt_mutex_proxy_unlock()
caused some confusion.

Add commentry to explain why it is safe to do this lockless. Add matching
comments to rt_mutex_init_proxy_locked() for completeness sake.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/20161130210030.591941927@linutronix.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/locking/rtmutex.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6e6cab7..2f443ed 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1619,11 +1619,15 @@ EXPORT_SYMBOL_GPL(__rt_mutex_init);
  * rt_mutex_init_proxy_locked - initialize and lock a rt_mutex on behalf of a
  *				proxy owner
  *
- * @lock: 	the rt_mutex to be locked
+ * @lock:	the rt_mutex to be locked
  * @proxy_owner:the task to set as owner
  *
  * No locking. Caller has to do serializing itself
- * Special API call for PI-futex support
+ *
+ * Special API call for PI-futex support. This initializes the rtmutex and
+ * assigns it to @proxy_owner. Concurrent operations on the rtmutex are not
+ * possible at this point because the pi_state which contains the rtmutex
+ * is not yet visible to other tasks.
  */
 void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
 				struct task_struct *proxy_owner)
@@ -1637,10 +1641,14 @@ void rt_mutex_init_proxy_locked(struct rt_mutex *lock,
 /**
  * rt_mutex_proxy_unlock - release a lock on behalf of owner
  *
- * @lock: 	the rt_mutex to be locked
+ * @lock:	the rt_mutex to be locked
  *
  * No locking. Caller has to do serializing itself
- * Special API call for PI-futex support
+ *
+ * Special API call for PI-futex support. This merrily cleans up the rtmutex
+ * (debugging) state. Concurrent operations on this rt_mutex are not
+ * possible because it belongs to the pi_state which is about to be freed
+ * and it is not longer visible to other tasks.
  */
 void rt_mutex_proxy_unlock(struct rt_mutex *lock,
 			   struct task_struct *proxy_owner)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-12-02 11:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-30 21:04 [patch 0/4] rtmutex: Plug unlock vs. requeue race Thomas Gleixner
2016-11-30 21:04 ` [patch 1/4] rtmutex: Prevent dequeue vs. unlock race Thomas Gleixner
2016-12-01 17:56   ` David Daney
2016-12-01 18:25   ` Peter Zijlstra
2016-12-02  8:18     ` Thomas Gleixner
2016-12-02  0:53   ` Steven Rostedt
2016-12-02 10:45   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
2016-11-30 21:04 ` [patch 2/4] rtmutex: Use READ_ONCE() in rt_mutex_owner() Thomas Gleixner
2016-12-02 10:45   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
2016-11-30 21:04 ` [patch 3/4] rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL Thomas Gleixner
2016-12-02 10:46   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
2016-11-30 21:04 ` [patch 4/4] rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() Thomas Gleixner
2016-12-02 10:46   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
2016-12-01 18:33 ` [patch 0/4] rtmutex: Plug unlock vs. requeue race Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.