All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steven Rostedt <rostedt@goodmis.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: LKML <linux-kernel@vger.kernel.org>,
	David Daney <ddaney@caviumnetworks.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Sebastian Siewior <bigeasy@linutronix.de>,
	Will Deacon <will.deacon@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	stable@vger.kernel.org
Subject: Re: [patch 1/4] rtmutex: Prevent dequeue vs. unlock race
Date: Thu, 1 Dec 2016 19:53:06 -0500	[thread overview]
Message-ID: <20161201195306.5474ccc8@gandalf.local.home> (raw)
In-Reply-To: <20161130210030.351136722@linutronix.de>

On Wed, 30 Nov 2016 21:04:41 -0000
Thomas Gleixner <tglx@linutronix.de> wrote:

> David reported a futex/rtmutex state corruption. It's caused by the
> following problem:
> 
> CPU0		CPU1		CPU2
> 
> l->owner=T1
> 		rt_mutex_lock(l)
> 		lock(l->wait_lock)
> 		l->owner = T1 | HAS_WAITERS;
> 		enqueue(T2)
> 		boost()
> 		  unlock(l->wait_lock)
> 		schedule()
> 
> 				rt_mutex_lock(l)
> 				lock(l->wait_lock)
> 				l->owner = T1 | HAS_WAITERS;
> 				enqueue(T3)
> 				boost()
> 				  unlock(l->wait_lock)
> 				schedule()
> 		signal(->T2)	signal(->T3)
> 		lock(l->wait_lock)
> 		dequeue(T2)
> 		deboost()
> 		  unlock(l->wait_lock)
> 				lock(l->wait_lock)
> 				dequeue(T3)
> 				  ===> wait list is now empty  
> 				deboost()
> 				 unlock(l->wait_lock)
> 		lock(l->wait_lock)
> 		fixup_rt_mutex_waiters()
> 		  if (wait_list_empty(l)) {
> 		    owner = l->owner & ~HAS_WAITERS;
>  		    l->owner = owner
> 		     ==> l->owner = T1  
> 		  }
> 
> 				lock(l->wait_lock)
> rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
> 				  if (wait_list_empty(l)) {
> 				    owner = l->owner & ~HAS_WAITERS;
> cmpxchg(l->owner, T1, NULL)
>  ===> Success (l->owner = NULL)  
> 				    l->owner = owner
> 				     ==> l->owner = T1  
> 				  }
> 
> That means the problem is caused by fixup_rt_mutex_waiters() which does the
> RMW to clear the waiters bit unconditionally when there are no waiters in
> the rtmutexes rbtree.
> 
> This can be fatal: A concurrent unlock can release the rtmutex in the
> fastpath because the waiters bit is not set. If the cmpxchg() gets in the
> middle of the RMW operation then the previous owner, which just unlocked
> the rtmutex is set as the owner again when the write takes place after the
> successfull cmpxchg().
> 
> The solution is rather trivial: Verify that the owner member of the rtmutex
> has the waiters bit set before clearing it. This does not require a
> cmpxchg() or other atomic operations because the waiters bit can only be
> set and cleared with the rtmutex wait_lock held. It's also safe against the
> fast path unlock attempt. The unlock attempt via cmpxchg() will either see
> the bit set and take the slowpath or see the bit cleared and release it
> atomically in the fastpath.
> 
> It's remarkable that the test program provided by David triggers on ARM64
> and MIPS64 really quick, but it refuses to reproduce on x8664, while the
> problem exists there as well. That refusal might explain that this got not
> discovered earlier despite the bug existing from day one of the rtmutex
> implementation more than 10 years ago.

Because x86 is awesome! ;-)

> 
> Thanks to David for meticulously instrumenting the code and providing the
> information which allowed to decode this subtle problem.
> 
> Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
> Reported-by: David Daney <ddaney@caviumnetworks.com>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> ---
>  kernel/locking/rtmutex.c |   68 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 66 insertions(+), 2 deletions(-)
> 
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -65,8 +65,72 @@ static inline void clear_rt_mutex_waiter
>  
>  static void fixup_rt_mutex_waiters(struct rt_mutex *lock)
>  {
> -	if (!rt_mutex_has_waiters(lock))
> -		clear_rt_mutex_waiters(lock);

Hmm, now that clear_rt_mutex_waiters() has only one user, but luckily
it's done in the slow unlock case where the wait lock is held and its
the owner doing the update. Perhaps that function should go away, and
just open code it in the one use case. Because it's part of the danger
that happened here, and we don't want it used outside of an unlock.

Reviewed-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve


> +	unsigned long owner, *p = (unsigned long *) &lock->owner;
> +
> +	if (rt_mutex_has_waiters(lock))
> +		return;
> +
> +	/*
> +	 * The rbtree has no waiters enqueued, now make sure that the
> +	 * lock->owner still has the waiters bit set, otherwise the
> +	 * following can happen:
> +	 *
> +	 * CPU 0	CPU 1		CPU2
> +	 * l->owner=T1
> +	 *		rt_mutex_lock(l)
> +	 *		lock(l->lock)
> +	 *		l->owner = T1 | HAS_WAITERS;
> +	 *		enqueue(T2)
> +	 *		boost()
> +	 *		  unlock(l->lock)
> +	 *		block()
> +	 *
> +	 *				rt_mutex_lock(l)
> +	 *				lock(l->lock)
> +	 *				l->owner = T1 | HAS_WAITERS;
> +	 *				enqueue(T3)
> +	 *				boost()
> +	 *				  unlock(l->lock)
> +	 *				block()
> +	 *		signal(->T2)	signal(->T3)
> +	 *		lock(l->lock)
> +	 *		dequeue(T2)
> +	 *		deboost()
> +	 *		  unlock(l->lock)
> +	 *				lock(l->lock)
> +	 *				dequeue(T3)
> +	 *				 ==> wait list is empty
> +	 *				deboost()
> +	 *				 unlock(l->lock)
> +	 *		lock(l->lock)
> +	 *		fixup_rt_mutex_waiters()
> +	 *		  if (wait_list_empty(l) {
> +	 *		    l->owner = owner
> +	 *		    owner = l->owner & ~HAS_WAITERS;
> +	 *		      ==> l->owner = T1
> +	 *		  }
> +	 *				lock(l->lock)
> +	 * rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
> +	 *				  if (wait_list_empty(l) {
> +	 *				    owner = l->owner & ~HAS_WAITERS;
> +	 * cmpxchg(l->owner, T1, NULL)
> +	 *  ===> Success (l->owner = NULL)
> +	 *
> +	 *				    l->owner = owner
> +	 *				      ==> l->owner = T1
> +	 *				  }
> +	 *
> +	 * With the check for the waiter bit in place T3 on CPU2 will not
> +	 * overwrite. All tasks fiddling with the waiters bit are
> +	 * serialized by l->lock, so nothing else can modify the waiters
> +	 * bit. If the bit is set then nothing can change l->owner either
> +	 * so the simple RMW is safe. The cmpxchg() will simply fail if it
> +	 * happens in the middle of the RMW because the waiters bit is
> +	 * still set.
> +	 */
> +	owner = READ_ONCE(*p);
> +	if (owner & RT_MUTEX_HAS_WAITERS)
> +		WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
>  }
>  
>  /*
> 

  parent reply	other threads:[~2016-12-02  0:53 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-30 21:04 [patch 0/4] rtmutex: Plug unlock vs. requeue race Thomas Gleixner
2016-11-30 21:04 ` [patch 1/4] rtmutex: Prevent dequeue vs. unlock race Thomas Gleixner
2016-12-01 17:56   ` David Daney
2016-12-01 18:25   ` Peter Zijlstra
2016-12-02  8:18     ` Thomas Gleixner
2016-12-02  0:53   ` Steven Rostedt [this message]
2016-12-02 10:45   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
2016-11-30 21:04 ` [patch 2/4] rtmutex: Use READ_ONCE() in rt_mutex_owner() Thomas Gleixner
2016-12-02 10:45   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
2016-11-30 21:04 ` [patch 3/4] rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL Thomas Gleixner
2016-12-02 10:46   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
2016-11-30 21:04 ` [patch 4/4] rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() Thomas Gleixner
2016-12-02 10:46   ` [tip:locking/core] locking/rtmutex: " tip-bot for Thomas Gleixner
2016-12-01 18:33 ` [patch 0/4] rtmutex: Plug unlock vs. requeue race Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161201195306.5474ccc8@gandalf.local.home \
    --to=rostedt@goodmis.org \
    --cc=bigeasy@linutronix.de \
    --cc=ddaney@caviumnetworks.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.