All of lore.kernel.org
 help / color / mirror / Atom feed
* PI futexes + lock stealing woes
@ 2017-11-29 17:56 Julia Cartwright
  2017-12-01 20:11 ` Darren Hart
  2017-12-06 23:46 ` Peter Zijlstra
  0 siblings, 2 replies; 15+ messages in thread
From: Julia Cartwright @ 2017-11-29 17:56 UTC (permalink / raw)
  To: Thomas Gleixner, Peter Zijlstra
  Cc: Gratian Crisan, linux-kernel, Darren Hart, Ingo Molnar

Hey Thomas, Peter-

Gratian and I have been debugging into a nasty and difficult race w/
futexes seemingly the culprit.  The original symptom we were seeing
was a seemingly spurious -EDEADLK from a futex(LOCK_PI) operation.

On further analysis, however, it appears the thread which gets the
spurious -EDEADLK has observed a weird futex state: a prior
futex(WAIT_REQUEUE_PI) operation has returned -ETIMEDOUT, but the uaddr2
futex word owner field indicates that it's the owner.

Here's an attempt to boil down this situation into a pseudo trace; I'm
happy to forward along the full traces as well, if that would be
helpful:

   waiter                                  waker                                            stealer (prio > waiter)

   futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
         timeout=[N ms])
      futex_wait_requeue_pi()
         futex_wait_queue_me()
            freezable_schedule()
            <scheduled out>
                                           futex(LOCK_PI, uaddr2)
                                           futex(CMP_REQUEUE_PI, uaddr,
                                                 uaddr2, 1, 0)
                                              /* requeues waiter to uaddr2 */
                                           futex(UNLOCK_PI, uaddr2)
                                                 wake_futex_pi()
                                                    cmp_futex_value_locked(uaddr, waiter)
                                                    wake_up_q()
           <woken by waker>
           <hrtimer_wakeup() fires,
            clears sleeper->task>
                                                                                           futex(LOCK_PI, uaddr2)
                                                                                              __rt_mutex_start_proxy_lock()
                                                                                                 try_to_take_rt_mutex() /* steals lock */
                                                                                                    rt_mutex_set_owner(lock, stealer)
                                                                                              <preempted>
         <scheduled in>
         rt_mutex_wait_proxy_lock()
            __rt_mutex_slowlock()
               try_to_take_rt_mutex() /* fails, lock held by stealer */
               if (timeout && !timeout->task)
                  return -ETIMEDOUT;
            fixup_owner()
               /* lock wasn't acquired, so,
                  fixup_pi_state_owner skipped */
   return -ETIMEDOUT;

   /* At this point, we've returned -ETIMEDOUT to userspace, but the
    * futex word shows waiter to be the owner, and the pi_mutex has
    * stealer as the owner */

   futex_lock(LOCK_PI, uaddr2)
     -> bails with EDEADLK, futex word says we're owner.

At some later point in execution, the stealer gets scheduled back in and
will do fixup_owner() which fixes up the futex word, but at that point
it's too late: the waiter has already observed the wonky state.

fixup_owner() used to have additional seemingly relevant checks in place
that were removed 73d786bd043eb ("futex: Rework inconsistent
rt_mutex/futex_q state").

The actual kernel we've been testing is 4.9.33-rt23, w/ 153fbd1226fb3
("futex: Fix more put_pi_state() vs. exit_pi_state_list() races")
cherry-picked w/ PREEMPT_RT_FULL.  However, it appears that this issue
may affect v4.15-rc1?

Thoughts on how to move forward?

Nasty.

   Julia

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PI futexes + lock stealing woes
  2017-11-29 17:56 PI futexes + lock stealing woes Julia Cartwright
@ 2017-12-01 20:11 ` Darren Hart
  2017-12-01 21:49   ` Julia Cartwright
  2017-12-06 23:46 ` Peter Zijlstra
  1 sibling, 1 reply; 15+ messages in thread
From: Darren Hart @ 2017-12-01 20:11 UTC (permalink / raw)
  To: Julia Cartwright
  Cc: Thomas Gleixner, Peter Zijlstra, Gratian Crisan, linux-kernel,
	Ingo Molnar

On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:
> Hey Thomas, Peter-
> 
> Gratian and I have been debugging into a nasty and difficult race w/
> futexes seemingly the culprit.  The original symptom we were seeing
> was a seemingly spurious -EDEADLK from a futex(LOCK_PI) operation.
> 
> On further analysis, however, it appears the thread which gets the
> spurious -EDEADLK has observed a weird futex state: a prior
> futex(WAIT_REQUEUE_PI) operation has returned -ETIMEDOUT, but the uaddr2
> futex word owner field indicates that it's the owner.
> 

Do you have a reproducer you can share?

> Here's an attempt to boil down this situation into a pseudo trace; I'm
> happy to forward along the full traces as well, if that would be
> helpful:

Please do forward the full trace

> 
>    waiter                                  waker                                            stealer (prio > waiter)
> 
>    futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
>          timeout=[N ms])
>       futex_wait_requeue_pi()
>          futex_wait_queue_me()
>             freezable_schedule()
>             <scheduled out>
>                                            futex(LOCK_PI, uaddr2)
>                                            futex(CMP_REQUEUE_PI, uaddr,
>                                                  uaddr2, 1, 0)
>                                               /* requeues waiter to uaddr2 */
>                                            futex(UNLOCK_PI, uaddr2)
>                                                  wake_futex_pi()
>                                                     cmp_futex_value_locked(uaddr, waiter)
>                                                     wake_up_q()
>            <woken by waker>
>            <hrtimer_wakeup() fires,
>             clears sleeper->task>
>                                                                                            futex(LOCK_PI, uaddr2)
>                                                                                               __rt_mutex_start_proxy_lock()
>                                                                                                  try_to_take_rt_mutex() /* steals lock */
>                                                                                                     rt_mutex_set_owner(lock, stealer)
>                                                                                               <preempted>
>          <scheduled in>
>          rt_mutex_wait_proxy_lock()


>             __rt_mutex_slowlock()
>                try_to_take_rt_mutex() /* fails, lock held by stealer */
>                if (timeout && !timeout->task)
>                   return -ETIMEDOUT;
>             fixup_owner()
>                /* lock wasn't acquired, so,
>                   fixup_pi_state_owner skipped */
>    return -ETIMEDOUT;
> 
>    /* At this point, we've returned -ETIMEDOUT to userspace, but the
>     * futex word shows waiter to be the owner, and the pi_mutex has
>     * stealer as the owner */
> 

eeeeeeewwwweeee


>    futex_lock(LOCK_PI, uaddr2)
>      -> bails with EDEADLK, futex word says we're owner.
> 
> At some later point in execution, the stealer gets scheduled back in and
> will do fixup_owner() which fixes up the futex word, but at that point
> it's too late: the waiter has already observed the wonky state.
> 
> fixup_owner() used to have additional seemingly relevant checks in place
> that were removed 73d786bd043eb ("futex: Rework inconsistent
> rt_mutex/futex_q state").

This and the subsequent changes moving some of this out from under the hb->lock
are interesting - and were quite fun to review at the time. Hrm.

I'll continue paging this stuff in, although I suspect Peter will likely beat me
to it. In the meantime, if you can share the reproducer and/or the trace you
collected, that will be helpful.

> 
> The actual kernel we've been testing is 4.9.33-rt23, w/ 153fbd1226fb3
> ("futex: Fix more put_pi_state() vs. exit_pi_state_list() races")

And this does not exhibit the behavior above, correct?

> cherry-picked w/ PREEMPT_RT_FULL.  However, it appears that this issue
> may affect v4.15-rc1?

And this does?

> 
> Thoughts on how to move forward?
> 
> Nasty.
> 
>    Julia
> 

-- 
Darren Hart
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PI futexes + lock stealing woes
  2017-12-01 20:11 ` Darren Hart
@ 2017-12-01 21:49   ` Julia Cartwright
  2017-12-02  0:32     ` Darren Hart
  0 siblings, 1 reply; 15+ messages in thread
From: Julia Cartwright @ 2017-12-01 21:49 UTC (permalink / raw)
  To: Darren Hart
  Cc: Thomas Gleixner, Peter Zijlstra, Gratian Crisan, linux-kernel,
	Ingo Molnar

On Fri, Dec 01, 2017 at 12:11:15PM -0800, Darren Hart wrote:
> On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:
> > Hey Thomas, Peter-
> > 
> > Gratian and I have been debugging into a nasty and difficult race w/
> > futexes seemingly the culprit.  The original symptom we were seeing
> > was a seemingly spurious -EDEADLK from a futex(LOCK_PI) operation.
> > 
> > On further analysis, however, it appears the thread which gets the
> > spurious -EDEADLK has observed a weird futex state: a prior
> > futex(WAIT_REQUEUE_PI) operation has returned -ETIMEDOUT, but the uaddr2
> > futex word owner field indicates that it's the owner.
> > 
> 
> Do you have a reproducer you can share?

We have a massive application which seems to reproduce it in 8 hours or
so, but it's not in a state to be shared :(.  So far, every attempt at
creating a simple, smaller reproducing case has failed.  We're still
trying, though :(.

One debugging technique we're trying to employ as well now that we think
we have a handle on the race is to pry the race window open with some
strategically placed spinning (or fixed-period sleeping).  Hopefully
that will make it easier to reproduce ...

> > Here's an attempt to boil down this situation into a pseudo trace; I'm
> > happy to forward along the full traces as well, if that would be
> > helpful:
> 
> Please do forward the full trace

Will do.  Chances are they are large enough to bounce from LKML, but
I'll send them along privately.

> > 
> >    waiter                                  waker                                            stealer (prio > waiter)
> > 
> >    futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
> >          timeout=[N ms])
> >       futex_wait_requeue_pi()
> >          futex_wait_queue_me()
> >             freezable_schedule()
> >             <scheduled out>
> >                                            futex(LOCK_PI, uaddr2)
> >                                            futex(CMP_REQUEUE_PI, uaddr,
> >                                                  uaddr2, 1, 0)
> >                                               /* requeues waiter to uaddr2 */
> >                                            futex(UNLOCK_PI, uaddr2)
> >                                                  wake_futex_pi()
> >                                                     cmp_futex_value_locked(uaddr, waiter)

minor fix: the above should have been:                  cmp_futex_value_locked(uaddr2, waiter)

> >                                                     wake_up_q()
> >            <woken by waker>
> >            <hrtimer_wakeup() fires,
> >             clears sleeper->task>
> >                                                                                            futex(LOCK_PI, uaddr2)
> >                                                                                               __rt_mutex_start_proxy_lock()
> >                                                                                                  try_to_take_rt_mutex() /* steals lock */
> >                                                                                                     rt_mutex_set_owner(lock, stealer)
> >                                                                                               <preempted>
> >          <scheduled in>
> >          rt_mutex_wait_proxy_lock()
> >             __rt_mutex_slowlock()
> >                try_to_take_rt_mutex() /* fails, lock held by stealer */
> >                if (timeout && !timeout->task)
> >                   return -ETIMEDOUT;
> >             fixup_owner()
> >                /* lock wasn't acquired, so,
> >                   fixup_pi_state_owner skipped */
> >    return -ETIMEDOUT;
> >
> >    /* At this point, we've returned -ETIMEDOUT to userspace, but the
> >     * futex word shows waiter to be the owner, and the pi_mutex has
> >     * stealer as the owner */
>
> eeeeeeewwwweeee

Indeed. :(

> >    futex_lock(LOCK_PI, uaddr2)
> >      -> bails with EDEADLK, futex word says we're owner.
> > 
> > At some later point in execution, the stealer gets scheduled back in and
> > will do fixup_owner() which fixes up the futex word, but at that point
> > it's too late: the waiter has already observed the wonky state.
> > 
> > fixup_owner() used to have additional seemingly relevant checks in place
> > that were removed 73d786bd043eb ("futex: Rework inconsistent
> > rt_mutex/futex_q state").
> 
> This and the subsequent changes moving some of this out from under the hb->lock
> are interesting - and were quite fun to review at the time. Hrm.
> 
> I'll continue paging this stuff in, although I suspect Peter will likely beat me
> to it. In the meantime, if you can share the reproducer and/or the trace you
> collected, that will be helpful.
> 
> > The actual kernel we've been testing is 4.9.33-rt23, w/ 153fbd1226fb3
> > ("futex: Fix more put_pi_state() vs. exit_pi_state_list() races")
> 
> And this does not exhibit the behavior above, correct?

Sorry if I was unclear.  This combination _does_ exhibit this incorrect
behavior.

> > cherry-picked w/ PREEMPT_RT_FULL.  However, it appears that this issue
> > may affect v4.15-rc1?
>
> And this does?

I only meant that: as far as I can tell the affected codepaths are
mostly the same between v4.9.33-rt23 and v4.15-rc1, as the futex
reworking stuff was cherry-picked back.

We haven't yet tried reproducing on v4.15-rc1, and aren't really at a
place where we can do so quickly.  It's unclear whether or not
PREEMPT_RT is required to reproduce.

Thanks!
   Julia

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PI futexes + lock stealing woes
  2017-12-01 21:49   ` Julia Cartwright
@ 2017-12-02  0:32     ` Darren Hart
  0 siblings, 0 replies; 15+ messages in thread
From: Darren Hart @ 2017-12-02  0:32 UTC (permalink / raw)
  To: Julia Cartwright
  Cc: Thomas Gleixner, Peter Zijlstra, Gratian Crisan, linux-kernel,
	Ingo Molnar

On Fri, Dec 01, 2017 at 03:49:01PM -0600, Julia Cartwright wrote:
> On Fri, Dec 01, 2017 at 12:11:15PM -0800, Darren Hart wrote:
> > On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:
> > 
> > > The actual kernel we've been testing is 4.9.33-rt23, w/ 153fbd1226fb3
> > > ("futex: Fix more put_pi_state() vs. exit_pi_state_list() races")
> > 
> > And this does not exhibit the behavior above, correct?
> 
> Sorry if I was unclear.  This combination _does_ exhibit this incorrect
> behavior.
> 
> > > cherry-picked w/ PREEMPT_RT_FULL.  However, it appears that this issue
> > > may affect v4.15-rc1?
> >
> > And this does?
> 
> I only meant that: as far as I can tell the affected codepaths are
> mostly the same between v4.9.33-rt23 and v4.15-rc1, as the futex
> reworking stuff was cherry-picked back.
> 

Can you compare the futex and rtmutex related histories and see if there
is possibly something missing from the backport? A diff from current
version of the relevant files would be worth doing as well. There is
enough subtly here, that I'd want to be as confident as possible that we
aren't missing something here.

> We haven't yet tried reproducing on v4.15-rc1, and aren't really at a
> place where we can do so quickly.  It's unclear whether or not
> PREEMPT_RT is required to reproduce.

Obviously reproducing on an unmodified upstream (RT or not) would be a
very valuable data point.

> 
> Thanks!
>    Julia
> 

-- 
Darren Hart
VMware Open Source Technology Center

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PI futexes + lock stealing woes
  2017-11-29 17:56 PI futexes + lock stealing woes Julia Cartwright
  2017-12-01 20:11 ` Darren Hart
@ 2017-12-06 23:46 ` Peter Zijlstra
  2017-12-07  2:09   ` Gratian Crisan
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2017-12-06 23:46 UTC (permalink / raw)
  To: Julia Cartwright
  Cc: Thomas Gleixner, Gratian Crisan, linux-kernel, Darren Hart, Ingo Molnar

On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:

> fixup_owner() used to have additional seemingly relevant checks in place
> that were removed 73d786bd043eb ("futex: Rework inconsistent
> rt_mutex/futex_q state").

*groan*... yes. I completely missed that extra case also applied to
requeue_pi (requeue always did hurt my brain).

I got as far as actually understanding the problem; but its near 1am and
I desperately need a sleep, I'll see if I can come up with a solution
tomorrow. I just wanted to let you know I finally got around to looking
at this..

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PI futexes + lock stealing woes
  2017-12-06 23:46 ` Peter Zijlstra
@ 2017-12-07  2:09   ` Gratian Crisan
  2017-12-07 10:45     ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Gratian Crisan @ 2017-12-07  2:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Julia Cartwright, Thomas Gleixner, Gratian Crisan, linux-kernel,
	Darren Hart, Ingo Molnar


Peter Zijlstra writes:

> On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:
>
>> fixup_owner() used to have additional seemingly relevant checks in place
>> that were removed 73d786bd043eb ("futex: Rework inconsistent
>> rt_mutex/futex_q state").
>
> *groan*... yes. I completely missed that extra case also applied to
> requeue_pi (requeue always did hurt my brain).

FWIW I have been testing for about two days now with the fixup_owner()
hunk of 73d786bd043eb ("futex: Rework inconsistent rt_mutex/futex_q
state") reverted. So far it hasn't hit the race/deadlock. It normally
takes around 8 hours to reproduce.

I've also tried Julia's msleep() trick for expanding the race window for
the last 4 hours or so of testing and it seems to be still going.

Thanks,
    Gratian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PI futexes + lock stealing woes
  2017-12-07  2:09   ` Gratian Crisan
@ 2017-12-07 10:45     ` Peter Zijlstra
  2017-12-07 14:26       ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2017-12-07 10:45 UTC (permalink / raw)
  To: Gratian Crisan
  Cc: Julia Cartwright, Thomas Gleixner, linux-kernel, Darren Hart,
	Ingo Molnar

On Wed, Dec 06, 2017 at 08:09:28PM -0600, Gratian Crisan wrote:
> 
> Peter Zijlstra writes:
> 
> > On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:
> >
> >> fixup_owner() used to have additional seemingly relevant checks in place
> >> that were removed 73d786bd043eb ("futex: Rework inconsistent
> >> rt_mutex/futex_q state").
> >
> > *groan*... yes. I completely missed that extra case also applied to
> > requeue_pi (requeue always did hurt my brain).
> 
> FWIW I have been testing for about two days now with the fixup_owner()
> hunk of 73d786bd043eb ("futex: Rework inconsistent rt_mutex/futex_q
> state") reverted. So far it hasn't hit the race/deadlock. It normally
> takes around 8 hours to reproduce.

Yeah, that should more-or-less work I think. But I'm trying to see if
there's anything saner we can do, but so far my brain keeps slipping
off.

At the very least I want to kill the various wait_lock lockbreaks in
there, those hurt my brain and make me nervous as hell.  That fixup does
_3_ consecutive wait_lock sections, and it becomes a very complicated
story to argue why that's not riddled with holes.

For now I have something like the below; which obviously doesn't
compile yet. Let me grab lunch and such things before attempting more.

---
 kernel/futex.c                  | 33 ++++++++++++++++++++++++++++++++-
 kernel/locking/rtmutex.c        | 27 +++++++++++++++++++--------
 kernel/locking/rtmutex_common.h |  1 +
 3 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 76ed5921117a..8ad5221fbd84 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2303,14 +2303,35 @@ static void unqueue_me_pi(struct futex_q *q)
 static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
 				struct task_struct *newowner)
 {
-	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
 	struct futex_pi_state *pi_state = q->pi_state;
 	u32 uval, uninitialized_var(curval), newval;
 	struct task_struct *oldowner;
+	u32 newtid;
 	int ret;
 
 	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 
+	if (!newowner) {
+		if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
+			ret = 0;
+			goto out_unlock;
+		}
+
+		newowner = rt_mutex_owner(&pi_state->pi_mutex);
+		if (WARN_ON_ONCE(!newowner)) {
+			/*
+			 * We just attempted a trylock; since that failed there
+			 * must be an owner, right?
+			 */
+			ret = -EFUCKED; /* XXX: check return paths */
+			goto out_unlock;
+		}
+
+		/* OK we have a newowner, fixup uval */
+	}
+
+	newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+
 	oldowner = pi_state->owner;
 	/* Owner died? */
 	if (!pi_state->owner)
@@ -2443,6 +2464,16 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
 		goto out;
 	}
 
+	/*
+	 * If we didn't get the lock; check if nobody stole it from us.
+	 * In that case, we need to fix up the uval to point to them
+	 * instead of us, otherwise bad things happen.
+	 */
+	if (q->pi_state->owner == current) {
+		ret = fixup_pi_state_owner(uaddr, q, NULL);
+		goto out;
+	}
+
 	/*
 	 * Paranoia check. If we did not take the lock, then we should not be
 	 * the owner of the rt_mutex.
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6f3dba6e4e9e..21705f2fae1c 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1290,13 +1290,25 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
 	return ret;
 }
 
+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
+{
+	int ret = try_to_take_rt_mutex(lock, current, NULL);
+
+	/*
+	 * try_to_take_rt_mutex() sets the lock waiters bit
+	 * unconditionally. Clean this up.
+	 */
+	fixup_rt_mutex_waiters(lock);
+
+	return ret;
+}
+
 /*
  * Slow path try-lock function:
  */
 static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
 {
 	unsigned long flags;
-	int ret;
 
 	/*
 	 * If the lock already has an owner we fail to get the lock.
@@ -1312,13 +1324,7 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
 	 */
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 
-	ret = try_to_take_rt_mutex(lock, current, NULL);
-
-	/*
-	 * try_to_take_rt_mutex() sets the lock waiters bit
-	 * unconditionally. Clean this up.
-	 */
-	fixup_rt_mutex_waiters(lock);
+	ret = __rt_mutex_slowtrylock(lock);
 
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 
@@ -1505,6 +1511,11 @@ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
 	return rt_mutex_slowtrylock(lock);
 }
 
+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+	return __rt_mutex_slowtrylock(lock);
+}
+
 /**
  * rt_mutex_timed_lock - lock a rt_mutex interruptible
  *			the timeout structure is provided
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 124e98ca0b17..68686b3ec3c1 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
 				 struct rt_mutex_waiter *waiter);
 
 extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+extern int __rt_mutex_futex_trylock(struct rt_mutex *l);
 
 extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
 extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PI futexes + lock stealing woes
  2017-12-07 10:45     ` Peter Zijlstra
@ 2017-12-07 14:26       ` Peter Zijlstra
  2017-12-07 14:57         ` Gratian Crisan
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2017-12-07 14:26 UTC (permalink / raw)
  To: Gratian Crisan
  Cc: Julia Cartwright, Thomas Gleixner, linux-kernel, Darren Hart,
	Ingo Molnar

On Thu, Dec 07, 2017 at 11:45:16AM +0100, Peter Zijlstra wrote:
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 76ed5921117a..8ad5221fbd84 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2303,14 +2303,35 @@ static void unqueue_me_pi(struct futex_q *q)
>  static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
>  				struct task_struct *newowner)
>  {
> -	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
>  	struct futex_pi_state *pi_state = q->pi_state;
>  	u32 uval, uninitialized_var(curval), newval;
>  	struct task_struct *oldowner;
> +	u32 newtid;
>  	int ret;
>  
>  	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
>  
> +	if (!newowner) {
> +		if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
> +			ret = 0;
> +			goto out_unlock;
> +		}
> +
> +		newowner = rt_mutex_owner(&pi_state->pi_mutex);
> +		if (WARN_ON_ONCE(!newowner)) {
> +			/*
> +			 * We just attempted a trylock; since that failed there
> +			 * must be an owner, right?
> +			 */
> +			ret = -EFUCKED; /* XXX: check return paths */
> +			goto out_unlock;
> +		}
> +
> +		/* OK we have a newowner, fixup uval */
> +	}
> +
> +	newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
> +
>  	oldowner = pi_state->owner;
>  	/* Owner died? */
>  	if (!pi_state->owner)

OK, this is broken because it needs to be inside the retry label,
because we drop the locks when we take a fault. And if we drop the lock,
the rt_mutex owner can change etc..

> @@ -2443,6 +2464,16 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
>  		goto out;
>  	}
>  
> +	/*
> +	 * If we didn't get the lock; check if nobody stole it from us.
> +	 * In that case, we need to fix up the uval to point to them
> +	 * instead of us, otherwise bad things happen.
> +	 */
> +	if (q->pi_state->owner == current) {
> +		ret = fixup_pi_state_owner(uaddr, q, NULL);
> +		goto out;
> +	}

And this had me read the comment just above here that stated the
unlocked access of ->owner is fine, because only the rt_mutex owner
changes pi_state, except of course, this just changed that.

So now we need double check the state after we've taken the lock in
fixup_pi_state_owner().


The below compiles and boots, but is otherwise untested. Could you give
it a spin?

---
 kernel/futex.c                  | 83 +++++++++++++++++++++++++++++++++--------
 kernel/locking/rtmutex.c        | 26 +++++++++----
 kernel/locking/rtmutex_common.h |  1 +
 3 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 76ed5921117a..29ac5b64e7c7 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2294,21 +2294,17 @@ static void unqueue_me_pi(struct futex_q *q)
 	spin_unlock(q->lock_ptr);
 }
 
-/*
- * Fixup the pi_state owner with the new owner.
- *
- * Must be called with hash bucket lock held and mm->sem held for non
- * private futexes.
- */
 static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
-				struct task_struct *newowner)
+				struct task_struct *argowner)
 {
-	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
 	struct futex_pi_state *pi_state = q->pi_state;
 	u32 uval, uninitialized_var(curval), newval;
-	struct task_struct *oldowner;
+	struct task_struct *oldowner, *newowner;
+	u32 newtid;
 	int ret;
 
+	lockdep_assert_held(q->lock_ptr);
+
 	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 
 	oldowner = pi_state->owner;
@@ -2317,11 +2313,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
 		newtid |= FUTEX_OWNER_DIED;
 
 	/*
-	 * We are here either because we stole the rtmutex from the
-	 * previous highest priority waiter or we are the highest priority
-	 * waiter but have failed to get the rtmutex the first time.
+	 * We are here because either:
+	 *
+	 *  - we stole the lock and pi_state->owner needs updating to reflect
+	 *    that (@argowner == current),
 	 *
-	 * We have to replace the newowner TID in the user space variable.
+	 * or:
+	 *
+	 *  - someone stole our lock and we need to fix things to point to the
+	 *    new owner (@argowner == NULL).
+	 *
+	 * Either way, we have to replace the TID in the user space variable.
 	 * This must be atomic as we have to preserve the owner died bit here.
 	 *
 	 * Note: We write the user space value _before_ changing the pi_state
@@ -2334,6 +2336,42 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
 	 * in the PID check in lookup_pi_state.
 	 */
 retry:
+	if (!argowner) {
+		if (oldowner != current) {
+			/*
+			 * We raced against a concurrent self; things are
+			 * already fixed up. Nothing to do.
+			 */
+			ret = 0;
+			goto out_unlock;
+		}
+
+		if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
+			/* We got the lock after all, nothing to fix. */
+			ret = 0;
+			goto out_unlock;
+		}
+
+		/*
+		 * Since we just failed the trylock; there must be an owner.
+		 */
+		newowner = rt_mutex_owner(&pi_state->pi_mutex);
+		BUG_ON(!newowner);
+	} else {
+		WARN_ON_ONCE(argowner != current);
+		if (oldowner == current) {
+			/*
+			 * We raced against a concurrent self; things are
+			 * already fixed up. Nothing to do.
+			 */
+			ret = 0;
+			goto out_unlock;
+		}
+		newowner = argowner;
+	}
+
+	newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+
 	if (get_futex_value_locked(&uval, uaddr))
 		goto handle_fault;
 
@@ -2434,15 +2472,28 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
 		 * Got the lock. We might not be the anticipated owner if we
 		 * did a lock-steal - fix up the PI-state in that case:
 		 *
-		 * We can safely read pi_state->owner without holding wait_lock
-		 * because we now own the rt_mutex, only the owner will attempt
-		 * to change it.
+		 * Speculative pi_state->owner read (we don't hold wait_lock);
+		 * since we own the lock pi_state->owner == current is the
+		 * stable state, anything else needs more attention.
 		 */
 		if (q->pi_state->owner != current)
 			ret = fixup_pi_state_owner(uaddr, q, current);
 		goto out;
 	}
 
+	/*
+	 * If we didn't get the lock; check if anybody stole it from us. In
+	 * that case, we need to fix up the uval to point to them instead of
+	 * us, otherwise bad things happen. [10]
+	 *
+	 * Another speculative read; pi_state->owner == current is unstable
+	 * but needs our attention.
+	 */
+	if (q->pi_state->owner == current) {
+		ret = fixup_pi_state_owner(uaddr, q, NULL);
+		goto out;
+	}
+
 	/*
 	 * Paranoia check. If we did not take the lock, then we should not be
 	 * the owner of the rt_mutex.
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6f3dba6e4e9e..65cc0cb984e6 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1290,6 +1290,19 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
 	return ret;
 }
 
+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
+{
+	int ret = try_to_take_rt_mutex(lock, current, NULL);
+
+	/*
+	 * try_to_take_rt_mutex() sets the lock waiters bit
+	 * unconditionally. Clean this up.
+	 */
+	fixup_rt_mutex_waiters(lock);
+
+	return ret;
+}
+
 /*
  * Slow path try-lock function:
  */
@@ -1312,13 +1325,7 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
 	 */
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 
-	ret = try_to_take_rt_mutex(lock, current, NULL);
-
-	/*
-	 * try_to_take_rt_mutex() sets the lock waiters bit
-	 * unconditionally. Clean this up.
-	 */
-	fixup_rt_mutex_waiters(lock);
+	ret = __rt_mutex_slowtrylock(lock);
 
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 
@@ -1505,6 +1512,11 @@ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
 	return rt_mutex_slowtrylock(lock);
 }
 
+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+	return __rt_mutex_slowtrylock(lock);
+}
+
 /**
  * rt_mutex_timed_lock - lock a rt_mutex interruptible
  *			the timeout structure is provided
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 124e98ca0b17..68686b3ec3c1 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
 				 struct rt_mutex_waiter *waiter);
 
 extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+extern int __rt_mutex_futex_trylock(struct rt_mutex *l);
 
 extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
 extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PI futexes + lock stealing woes
  2017-12-07 14:26       ` Peter Zijlstra
@ 2017-12-07 14:57         ` Gratian Crisan
  2017-12-07 19:50           ` Julia Cartwright
  0 siblings, 1 reply; 15+ messages in thread
From: Gratian Crisan @ 2017-12-07 14:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gratian Crisan, Julia Cartwright, Thomas Gleixner, linux-kernel,
	Darren Hart, Ingo Molnar


Peter Zijlstra writes:
> The below compiles and boots, but is otherwise untested. Could you give
> it a spin?

Thank you! Yes, I'll start a test now.

-Gratian

> ---
>  kernel/futex.c                  | 83 +++++++++++++++++++++++++++++++++--------
>  kernel/locking/rtmutex.c        | 26 +++++++++----
>  kernel/locking/rtmutex_common.h |  1 +
>  3 files changed, 87 insertions(+), 23 deletions(-)
>
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 76ed5921117a..29ac5b64e7c7 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2294,21 +2294,17 @@ static void unqueue_me_pi(struct futex_q *q)
>  	spin_unlock(q->lock_ptr);
>  }
>  
> -/*
> - * Fixup the pi_state owner with the new owner.
> - *
> - * Must be called with hash bucket lock held and mm->sem held for non
> - * private futexes.
> - */
>  static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
> -				struct task_struct *newowner)
> +				struct task_struct *argowner)
>  {
> -	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
>  	struct futex_pi_state *pi_state = q->pi_state;
>  	u32 uval, uninitialized_var(curval), newval;
> -	struct task_struct *oldowner;
> +	struct task_struct *oldowner, *newowner;
> +	u32 newtid;
>  	int ret;
>  
> +	lockdep_assert_held(q->lock_ptr);
> +
>  	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
>  
>  	oldowner = pi_state->owner;
> @@ -2317,11 +2313,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
>  		newtid |= FUTEX_OWNER_DIED;
>  
>  	/*
> -	 * We are here either because we stole the rtmutex from the
> -	 * previous highest priority waiter or we are the highest priority
> -	 * waiter but have failed to get the rtmutex the first time.
> +	 * We are here because either:
> +	 *
> +	 *  - we stole the lock and pi_state->owner needs updating to reflect
> +	 *    that (@argowner == current),
>  	 *
> -	 * We have to replace the newowner TID in the user space variable.
> +	 * or:
> +	 *
> +	 *  - someone stole our lock and we need to fix things to point to the
> +	 *    new owner (@argowner == NULL).
> +	 *
> +	 * Either way, we have to replace the TID in the user space variable.
>  	 * This must be atomic as we have to preserve the owner died bit here.
>  	 *
>  	 * Note: We write the user space value _before_ changing the pi_state
> @@ -2334,6 +2336,42 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
>  	 * in the PID check in lookup_pi_state.
>  	 */
>  retry:
> +	if (!argowner) {
> +		if (oldowner != current) {
> +			/*
> +			 * We raced against a concurrent self; things are
> +			 * already fixed up. Nothing to do.
> +			 */
> +			ret = 0;
> +			goto out_unlock;
> +		}
> +
> +		if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
> +			/* We got the lock after all, nothing to fix. */
> +			ret = 0;
> +			goto out_unlock;
> +		}
> +
> +		/*
> +		 * Since we just failed the trylock; there must be an owner.
> +		 */
> +		newowner = rt_mutex_owner(&pi_state->pi_mutex);
> +		BUG_ON(!newowner);
> +	} else {
> +		WARN_ON_ONCE(argowner != current);
> +		if (oldowner == current) {
> +			/*
> +			 * We raced against a concurrent self; things are
> +			 * already fixed up. Nothing to do.
> +			 */
> +			ret = 0;
> +			goto out_unlock;
> +		}
> +		newowner = argowner;
> +	}
> +
> +	newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
> +
>  	if (get_futex_value_locked(&uval, uaddr))
>  		goto handle_fault;
>  
> @@ -2434,15 +2472,28 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
>  		 * Got the lock. We might not be the anticipated owner if we
>  		 * did a lock-steal - fix up the PI-state in that case:
>  		 *
> -		 * We can safely read pi_state->owner without holding wait_lock
> -		 * because we now own the rt_mutex, only the owner will attempt
> -		 * to change it.
> +		 * Speculative pi_state->owner read (we don't hold wait_lock);
> +		 * since we own the lock pi_state->owner == current is the
> +		 * stable state, anything else needs more attention.
>  		 */
>  		if (q->pi_state->owner != current)
>  			ret = fixup_pi_state_owner(uaddr, q, current);
>  		goto out;
>  	}
>  
> +	/*
> +	 * If we didn't get the lock; check if anybody stole it from us. In
> +	 * that case, we need to fix up the uval to point to them instead of
> +	 * us, otherwise bad things happen. [10]
> +	 *
> +	 * Another speculative read; pi_state->owner == current is unstable
> +	 * but needs our attention.
> +	 */
> +	if (q->pi_state->owner == current) {
> +		ret = fixup_pi_state_owner(uaddr, q, NULL);
> +		goto out;
> +	}
> +
>  	/*
>  	 * Paranoia check. If we did not take the lock, then we should not be
>  	 * the owner of the rt_mutex.
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 6f3dba6e4e9e..65cc0cb984e6 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1290,6 +1290,19 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
>  	return ret;
>  }
>  
> +static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
> +{
> +	int ret = try_to_take_rt_mutex(lock, current, NULL);
> +
> +	/*
> +	 * try_to_take_rt_mutex() sets the lock waiters bit
> +	 * unconditionally. Clean this up.
> +	 */
> +	fixup_rt_mutex_waiters(lock);
> +
> +	return ret;
> +}
> +
>  /*
>   * Slow path try-lock function:
>   */
> @@ -1312,13 +1325,7 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
>  	 */
>  	raw_spin_lock_irqsave(&lock->wait_lock, flags);
>  
> -	ret = try_to_take_rt_mutex(lock, current, NULL);
> -
> -	/*
> -	 * try_to_take_rt_mutex() sets the lock waiters bit
> -	 * unconditionally. Clean this up.
> -	 */
> -	fixup_rt_mutex_waiters(lock);
> +	ret = __rt_mutex_slowtrylock(lock);
>  
>  	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
>  
> @@ -1505,6 +1512,11 @@ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
>  	return rt_mutex_slowtrylock(lock);
>  }
>  
> +int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
> +{
> +	return __rt_mutex_slowtrylock(lock);
> +}
> +
>  /**
>   * rt_mutex_timed_lock - lock a rt_mutex interruptible
>   *			the timeout structure is provided
> diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
> index 124e98ca0b17..68686b3ec3c1 100644
> --- a/kernel/locking/rtmutex_common.h
> +++ b/kernel/locking/rtmutex_common.h
> @@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
>  				 struct rt_mutex_waiter *waiter);
>  
>  extern int rt_mutex_futex_trylock(struct rt_mutex *l);
> +extern int __rt_mutex_futex_trylock(struct rt_mutex *l);
>  
>  extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
>  extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PI futexes + lock stealing woes
  2017-12-07 14:57         ` Gratian Crisan
@ 2017-12-07 19:50           ` Julia Cartwright
  2017-12-07 23:02             ` Gratian Crisan
  0 siblings, 1 reply; 15+ messages in thread
From: Julia Cartwright @ 2017-12-07 19:50 UTC (permalink / raw)
  To: Gratian Crisan
  Cc: Peter Zijlstra, Thomas Gleixner, linux-kernel, Darren Hart, Ingo Molnar

On Thu, Dec 07, 2017 at 08:57:59AM -0600, Gratian Crisan wrote:
> 
> Peter Zijlstra writes:
> > The below compiles and boots, but is otherwise untested. Could you give
> > it a spin?
> 
> Thank you! Yes, I'll start a test now.

I gave it a spin with my minimal reproducing case w/ the strategic
msleep().  Previously, I could reproduce it in less than a minute in
that setup; now it's been running for a couple hours.  So ... looks
good? :)

Thanks,
   Julia

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: PI futexes + lock stealing woes
  2017-12-07 19:50           ` Julia Cartwright
@ 2017-12-07 23:02             ` Gratian Crisan
  2017-12-08 12:49               ` [PATCH] futex: Avoid violating the 10th rule of futex Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Gratian Crisan @ 2017-12-07 23:02 UTC (permalink / raw)
  To: Julia Cartwright
  Cc: Gratian Crisan, Peter Zijlstra, Thomas Gleixner, linux-kernel,
	Darren Hart, Ingo Molnar


Julia Cartwright writes:

> On Thu, Dec 07, 2017 at 08:57:59AM -0600, Gratian Crisan wrote:
>>
>> Peter Zijlstra writes:
>> > The below compiles and boots, but is otherwise untested. Could you give
>> > it a spin?
>>
>> Thank you! Yes, I'll start a test now.
>
> I gave it a spin with my minimal reproducing case w/ the strategic
> msleep().  Previously, I could reproduce it in less than a minute in
> that setup; now it's been running for a couple hours.  So ... looks
> good? :)

Yep ... looks good to me. I've been running two targets with the
original reproducer for 8 hours now plus a target running the C test.
All of them are still going.

I'm going to let them run overnight to make sure but I'm going to call
it fixed.

-Gratian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] futex: Avoid violating the 10th rule of futex
  2017-12-07 23:02             ` Gratian Crisan
@ 2017-12-08 12:49               ` Peter Zijlstra
  2017-12-08 16:04                 ` Gratian Crisan
                                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Peter Zijlstra @ 2017-12-08 12:49 UTC (permalink / raw)
  To: Gratian Crisan
  Cc: Julia Cartwright, Thomas Gleixner, linux-kernel, Darren Hart,
	Ingo Molnar

On Thu, Dec 07, 2017 at 05:02:40PM -0600, Gratian Crisan wrote:

> Yep ... looks good to me. I've been running two targets with the
> original reproducer for 8 hours now plus a target running the C test.
> All of them are still going.
> 
> I'm going to let them run overnight to make sure but I'm going to call
> it fixed.

Assuming nothing bad happens; find below the patch with a Changelog
attached.

---
Subject: futex: Avoid violating the 10th rule of futex
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Dec  7 16:54:23 CET 2017

Julia reported futex state corruption in the following scenario:

   waiter                                  waker                                            stealer (prio > waiter)

   futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
         timeout=[N ms])
      futex_wait_requeue_pi()
         futex_wait_queue_me()
            freezable_schedule()
            <scheduled out>
                                           futex(LOCK_PI, uaddr2)
                                           futex(CMP_REQUEUE_PI, uaddr,
                                                 uaddr2, 1, 0)
                                              /* requeues waiter to uaddr2 */
                                           futex(UNLOCK_PI, uaddr2)
                                                 wake_futex_pi()
                                                    cmp_futex_value_locked(uaddr2, waiter)
                                                    wake_up_q()
           <woken by waker>
           <hrtimer_wakeup() fires,
            clears sleeper->task>
                                                                                           futex(LOCK_PI, uaddr2)
                                                                                              __rt_mutex_start_proxy_lock()
                                                                                                 try_to_take_rt_mutex() /* steals lock */
                                                                                                    rt_mutex_set_owner(lock, stealer)
                                                                                              <preempted>
         <scheduled in>
         rt_mutex_wait_proxy_lock()
            __rt_mutex_slowlock()
               try_to_take_rt_mutex() /* fails, lock held by stealer */
               if (timeout && !timeout->task)
                  return -ETIMEDOUT;
            fixup_owner()
               /* lock wasn't acquired, so,
                  fixup_pi_state_owner skipped */

   return -ETIMEDOUT;

   /* At this point, we've returned -ETIMEDOUT to userspace, but the
    * futex word shows waiter to be the owner, and the pi_mutex has
    * stealer as the owner */

   futex_lock(LOCK_PI, uaddr2)
     -> bails with EDEADLK, futex word says we're owner.

And suggested that what commit:

  73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")

removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:

  16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")

changed all the locking rules. Even without that, the sequence:

-               if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
-                       locked = 1;
-                       goto out;
-               }

-               raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
-               owner = rt_mutex_owner(&q->pi_state->pi_mutex);
-               if (!owner)
-                       owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
-               raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
-               ret = fixup_pi_state_owner(uaddr, q, owner);

already suggests there were races; otherwise we'd never have to look
at next_owner.

So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.

Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.

Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-and-Tested-by: Julia Cartwright <julia@ni.com>
Reported-and-Tested-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2294,21 +2294,17 @@ static void unqueue_me_pi(struct futex_q
 	spin_unlock(q->lock_ptr);
 }
 
-/*
- * Fixup the pi_state owner with the new owner.
- *
- * Must be called with hash bucket lock held and mm->sem held for non
- * private futexes.
- */
 static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
-				struct task_struct *newowner)
+				struct task_struct *argowner)
 {
-	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
 	struct futex_pi_state *pi_state = q->pi_state;
 	u32 uval, uninitialized_var(curval), newval;
-	struct task_struct *oldowner;
+	struct task_struct *oldowner, *newowner;
+	u32 newtid;
 	int ret;
 
+	lockdep_assert_held(q->lock_ptr);
+
 	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 
 	oldowner = pi_state->owner;
@@ -2317,11 +2313,17 @@ static int fixup_pi_state_owner(u32 __us
 		newtid |= FUTEX_OWNER_DIED;
 
 	/*
-	 * We are here either because we stole the rtmutex from the
-	 * previous highest priority waiter or we are the highest priority
-	 * waiter but have failed to get the rtmutex the first time.
+	 * We are here because either:
+	 *
+	 *  - we stole the lock and pi_state->owner needs updating to reflect
+	 *    that (@argowner == current),
+	 *
+	 * or:
 	 *
-	 * We have to replace the newowner TID in the user space variable.
+	 *  - someone stole our lock and we need to fix things to point to the
+	 *    new owner (@argowner == NULL).
+	 *
+	 * Either way, we have to replace the TID in the user space variable.
 	 * This must be atomic as we have to preserve the owner died bit here.
 	 *
 	 * Note: We write the user space value _before_ changing the pi_state
@@ -2334,6 +2336,42 @@ static int fixup_pi_state_owner(u32 __us
 	 * in the PID check in lookup_pi_state.
 	 */
 retry:
+	if (!argowner) {
+		if (oldowner != current) {
+			/*
+			 * We raced against a concurrent self; things are
+			 * already fixed up. Nothing to do.
+			 */
+			ret = 0;
+			goto out_unlock;
+		}
+
+		if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
+			/* We got the lock after all, nothing to fix. */
+			ret = 0;
+			goto out_unlock;
+		}
+
+		/*
+		 * Since we just failed the trylock; there must be an owner.
+		 */
+		newowner = rt_mutex_owner(&pi_state->pi_mutex);
+		BUG_ON(!newowner);
+	} else {
+		WARN_ON_ONCE(argowner != current);
+		if (oldowner == current) {
+			/*
+			 * We raced against a concurrent self; things are
+			 * already fixed up. Nothing to do.
+			 */
+			ret = 0;
+			goto out_unlock;
+		}
+		newowner = argowner;
+	}
+
+	newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+
 	if (get_futex_value_locked(&uval, uaddr))
 		goto handle_fault;
 
@@ -2434,15 +2472,28 @@ static int fixup_owner(u32 __user *uaddr
 		 * Got the lock. We might not be the anticipated owner if we
 		 * did a lock-steal - fix up the PI-state in that case:
 		 *
-		 * We can safely read pi_state->owner without holding wait_lock
-		 * because we now own the rt_mutex, only the owner will attempt
-		 * to change it.
+		 * Speculative pi_state->owner read (we don't hold wait_lock);
+		 * since we own the lock pi_state->owner == current is the
+		 * stable state, anything else needs more attention.
 		 */
 		if (q->pi_state->owner != current)
 			ret = fixup_pi_state_owner(uaddr, q, current);
 		goto out;
 	}
 
+	/*
+	 * If we didn't get the lock; check if anybody stole it from us. In
+	 * that case, we need to fix up the uval to point to them instead of
+	 * us, otherwise bad things happen. [10]
+	 *
+	 * Another speculative read; pi_state->owner == current is unstable
+	 * but needs our attention.
+	 */
+	if (q->pi_state->owner == current) {
+		ret = fixup_pi_state_owner(uaddr, q, NULL);
+		goto out;
+	}
+
 	/*
 	 * Paranoia check. If we did not take the lock, then we should not be
 	 * the owner of the rt_mutex.
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1290,6 +1290,19 @@ rt_mutex_slowlock(struct rt_mutex *lock,
 	return ret;
 }
 
+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
+{
+	int ret = try_to_take_rt_mutex(lock, current, NULL);
+
+	/*
+	 * try_to_take_rt_mutex() sets the lock waiters bit
+	 * unconditionally. Clean this up.
+	 */
+	fixup_rt_mutex_waiters(lock);
+
+	return ret;
+}
+
 /*
  * Slow path try-lock function:
  */
@@ -1312,13 +1325,7 @@ static inline int rt_mutex_slowtrylock(s
 	 */
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 
-	ret = try_to_take_rt_mutex(lock, current, NULL);
-
-	/*
-	 * try_to_take_rt_mutex() sets the lock waiters bit
-	 * unconditionally. Clean this up.
-	 */
-	fixup_rt_mutex_waiters(lock);
+	ret = __rt_mutex_slowtrylock(lock);
 
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 
@@ -1505,6 +1512,11 @@ int __sched rt_mutex_futex_trylock(struc
 	return rt_mutex_slowtrylock(lock);
 }
 
+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+	return __rt_mutex_slowtrylock(lock);
+}
+
 /**
  * rt_mutex_timed_lock - lock a rt_mutex interruptible
  *			the timeout structure is provided
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(
 				 struct rt_mutex_waiter *waiter);
 
 extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+extern int __rt_mutex_futex_trylock(struct rt_mutex *l);
 
 extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
 extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] futex: Avoid violating the 10th rule of futex
  2017-12-08 12:49               ` [PATCH] futex: Avoid violating the 10th rule of futex Peter Zijlstra
@ 2017-12-08 16:04                 ` Gratian Crisan
  2018-01-08 21:09                 ` Julia Cartwright
  2018-01-14 18:06                 ` [tip:locking/urgent] " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 15+ messages in thread
From: Gratian Crisan @ 2017-12-08 16:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gratian Crisan, Julia Cartwright, Thomas Gleixner, linux-kernel,
	Darren Hart, Ingo Molnar


Peter Zijlstra writes:

> On Thu, Dec 07, 2017 at 05:02:40PM -0600, Gratian Crisan wrote:
>
>> Yep ... looks good to me. I've been running two targets with the
>> original reproducer for 8 hours now plus a target running the C test.
>> All of them are still going.
>> 
>> I'm going to let them run overnight to make sure but I'm going to call
>> it fixed.
>
> Assuming nothing bad happens; find below the patch with a Changelog
> attached.

Looks good after overnight testing. Thank you so much.

-Gratian

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] futex: Avoid violating the 10th rule of futex
  2017-12-08 12:49               ` [PATCH] futex: Avoid violating the 10th rule of futex Peter Zijlstra
  2017-12-08 16:04                 ` Gratian Crisan
@ 2018-01-08 21:09                 ` Julia Cartwright
  2018-01-14 18:06                 ` [tip:locking/urgent] " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 15+ messages in thread
From: Julia Cartwright @ 2018-01-08 21:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Gratian Crisan, Thomas Gleixner, linux-kernel, Darren Hart, Ingo Molnar

On Fri, Dec 08, 2017 at 01:49:39PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 07, 2017 at 05:02:40PM -0600, Gratian Crisan wrote:
[..]
> Assuming nothing bad happens; find below the patch with a Changelog
> attached.
> 
> ---
> Subject: futex: Avoid violating the 10th rule of futex
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Thu Dec  7 16:54:23 CET 2017
> 
> Julia reported futex state corruption in the following scenario:
> 
>    waiter                                  waker                                            stealer (prio > waiter)
> 
>    futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
>          timeout=[N ms])
>       futex_wait_requeue_pi()
>          futex_wait_queue_me()
>             freezable_schedule()
>             <scheduled out>
>                                            futex(LOCK_PI, uaddr2)
>                                            futex(CMP_REQUEUE_PI, uaddr,
>                                                  uaddr2, 1, 0)
>                                               /* requeues waiter to uaddr2 */
>                                            futex(UNLOCK_PI, uaddr2)
>                                                  wake_futex_pi()
>                                                     cmp_futex_value_locked(uaddr2, waiter)
>                                                     wake_up_q()
>            <woken by waker>
>            <hrtimer_wakeup() fires,
>             clears sleeper->task>
>                                                                                            futex(LOCK_PI, uaddr2)
>                                                                                               __rt_mutex_start_proxy_lock()
>                                                                                                  try_to_take_rt_mutex() /* steals lock */
>                                                                                                     rt_mutex_set_owner(lock, stealer)
>                                                                                               <preempted>
>          <scheduled in>
>          rt_mutex_wait_proxy_lock()
>             __rt_mutex_slowlock()
>                try_to_take_rt_mutex() /* fails, lock held by stealer */
>                if (timeout && !timeout->task)
>                   return -ETIMEDOUT;
>             fixup_owner()
>                /* lock wasn't acquired, so,
>                   fixup_pi_state_owner skipped */
> 
>    return -ETIMEDOUT;
> 
>    /* At this point, we've returned -ETIMEDOUT to userspace, but the
>     * futex word shows waiter to be the owner, and the pi_mutex has
>     * stealer as the owner */
> 
>    futex_lock(LOCK_PI, uaddr2)
>      -> bails with EDEADLK, futex word says we're owner.
> 
> And suggested that what commit:
> 
>   73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
> 
> removes from fixup_owner() looks to be just what is needed. And indeed
> it is -- I completely missed that requeue_pi could also result in this
> case. So we need to restore that, except that subsequent patches, like
> commit:
> 
>   16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")
> 
> changed all the locking rules. Even without that, the sequence:
> 
> -               if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
> -                       locked = 1;
> -                       goto out;
> -               }
> 
> -               raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
> -               owner = rt_mutex_owner(&q->pi_state->pi_mutex);
> -               if (!owner)
> -                       owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
> -               raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
> -               ret = fixup_pi_state_owner(uaddr, q, owner);
> 
> already suggests there were races; otherwise we'd never have to look
> at next_owner.
> 
> So instead of doing 3 consecutive wait_lock sections with who knows
> what races, we do it all in a single section. Additionally, the usage
> of pi_state->owner in fixup_owner() was only safe because only the
> rt_mutex owner would modify it, which this additional case wrecks.
> 
> Luckily the values can only change away and not to the value we're
> testing, this means we can do a speculative test and double check once
> we have the wait_lock.
> 
> Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
> Reported-and-Tested-by: Julia Cartwright <julia@ni.com>
> Reported-and-Tested-by: Gratian Crisan <gratian.crisan@ni.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Hey Peter-

We've been running w/ this patch now without further regression.  I was
expecting to see this hit 4.15-rc* eventually, but haven't seen it land
anywhere.  Is this in the queue for 4.16, then?

Thanks!
   Julia

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [tip:locking/urgent] futex: Avoid violating the 10th rule of futex
  2017-12-08 12:49               ` [PATCH] futex: Avoid violating the 10th rule of futex Peter Zijlstra
  2017-12-08 16:04                 ` Gratian Crisan
  2018-01-08 21:09                 ` Julia Cartwright
@ 2018-01-14 18:06                 ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 15+ messages in thread
From: tip-bot for Peter Zijlstra @ 2018-01-14 18:06 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: gratian.crisan, peterz, julia, tglx, mingo, linux-kernel, dvhart, hpa

Commit-ID:  c1e2f0eaf015fb7076d51a339011f2383e6dd389
Gitweb:     https://git.kernel.org/tip/c1e2f0eaf015fb7076d51a339011f2383e6dd389
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 8 Dec 2017 13:49:39 +0100
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sun, 14 Jan 2018 18:49:16 +0100

futex: Avoid violating the 10th rule of futex

Julia reported futex state corruption in the following scenario:

   waiter                                  waker                                            stealer (prio > waiter)

   futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
         timeout=[N ms])
      futex_wait_requeue_pi()
         futex_wait_queue_me()
            freezable_schedule()
            <scheduled out>
                                           futex(LOCK_PI, uaddr2)
                                           futex(CMP_REQUEUE_PI, uaddr,
                                                 uaddr2, 1, 0)
                                              /* requeues waiter to uaddr2 */
                                           futex(UNLOCK_PI, uaddr2)
                                                 wake_futex_pi()
                                                    cmp_futex_value_locked(uaddr2, waiter)
                                                    wake_up_q()
           <woken by waker>
           <hrtimer_wakeup() fires,
            clears sleeper->task>
                                                                                           futex(LOCK_PI, uaddr2)
                                                                                              __rt_mutex_start_proxy_lock()
                                                                                                 try_to_take_rt_mutex() /* steals lock */
                                                                                                    rt_mutex_set_owner(lock, stealer)
                                                                                              <preempted>
         <scheduled in>
         rt_mutex_wait_proxy_lock()
            __rt_mutex_slowlock()
               try_to_take_rt_mutex() /* fails, lock held by stealer */
               if (timeout && !timeout->task)
                  return -ETIMEDOUT;
            fixup_owner()
               /* lock wasn't acquired, so,
                  fixup_pi_state_owner skipped */

   return -ETIMEDOUT;

   /* At this point, we've returned -ETIMEDOUT to userspace, but the
    * futex word shows waiter to be the owner, and the pi_mutex has
    * stealer as the owner */

   futex_lock(LOCK_PI, uaddr2)
     -> bails with EDEADLK, futex word says we're owner.

And suggested that what commit:

  73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")

removes from fixup_owner() looks to be just what is needed. And indeed
it is -- I completely missed that requeue_pi could also result in this
case. So we need to restore that, except that subsequent patches, like
commit:

  16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")

changed all the locking rules. Even without that, the sequence:

-               if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) {
-                       locked = 1;
-                       goto out;
-               }

-               raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock);
-               owner = rt_mutex_owner(&q->pi_state->pi_mutex);
-               if (!owner)
-                       owner = rt_mutex_next_owner(&q->pi_state->pi_mutex);
-               raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock);
-               ret = fixup_pi_state_owner(uaddr, q, owner);

already suggests there were races; otherwise we'd never have to look
at next_owner.

So instead of doing 3 consecutive wait_lock sections with who knows
what races, we do it all in a single section. Additionally, the usage
of pi_state->owner in fixup_owner() was only safe because only the
rt_mutex owner would modify it, which this additional case wrecks.

Luckily the values can only change away and not to the value we're
testing, this means we can do a speculative test and double check once
we have the wait_lock.

Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
Reported-by: Julia Cartwright <julia@ni.com>
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Julia Cartwright <julia@ni.com>
Tested-by: Gratian Crisan <gratian.crisan@ni.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65rrc@hirez.programming.kicks-ass.net
---
 kernel/futex.c                  | 83 +++++++++++++++++++++++++++++++++--------
 kernel/locking/rtmutex.c        | 26 +++++++++----
 kernel/locking/rtmutex_common.h |  1 +
 3 files changed, 87 insertions(+), 23 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 57d0b36..9e69589 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2294,21 +2294,17 @@ static void unqueue_me_pi(struct futex_q *q)
 	spin_unlock(q->lock_ptr);
 }
 
-/*
- * Fixup the pi_state owner with the new owner.
- *
- * Must be called with hash bucket lock held and mm->sem held for non
- * private futexes.
- */
 static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
-				struct task_struct *newowner)
+				struct task_struct *argowner)
 {
-	u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
 	struct futex_pi_state *pi_state = q->pi_state;
 	u32 uval, uninitialized_var(curval), newval;
-	struct task_struct *oldowner;
+	struct task_struct *oldowner, *newowner;
+	u32 newtid;
 	int ret;
 
+	lockdep_assert_held(q->lock_ptr);
+
 	raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock);
 
 	oldowner = pi_state->owner;
@@ -2317,11 +2313,17 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
 		newtid |= FUTEX_OWNER_DIED;
 
 	/*
-	 * We are here either because we stole the rtmutex from the
-	 * previous highest priority waiter or we are the highest priority
-	 * waiter but have failed to get the rtmutex the first time.
+	 * We are here because either:
+	 *
+	 *  - we stole the lock and pi_state->owner needs updating to reflect
+	 *    that (@argowner == current),
 	 *
-	 * We have to replace the newowner TID in the user space variable.
+	 * or:
+	 *
+	 *  - someone stole our lock and we need to fix things to point to the
+	 *    new owner (@argowner == NULL).
+	 *
+	 * Either way, we have to replace the TID in the user space variable.
 	 * This must be atomic as we have to preserve the owner died bit here.
 	 *
 	 * Note: We write the user space value _before_ changing the pi_state
@@ -2334,6 +2336,42 @@ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q,
 	 * in the PID check in lookup_pi_state.
 	 */
 retry:
+	if (!argowner) {
+		if (oldowner != current) {
+			/*
+			 * We raced against a concurrent self; things are
+			 * already fixed up. Nothing to do.
+			 */
+			ret = 0;
+			goto out_unlock;
+		}
+
+		if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) {
+			/* We got the lock after all, nothing to fix. */
+			ret = 0;
+			goto out_unlock;
+		}
+
+		/*
+		 * Since we just failed the trylock; there must be an owner.
+		 */
+		newowner = rt_mutex_owner(&pi_state->pi_mutex);
+		BUG_ON(!newowner);
+	} else {
+		WARN_ON_ONCE(argowner != current);
+		if (oldowner == current) {
+			/*
+			 * We raced against a concurrent self; things are
+			 * already fixed up. Nothing to do.
+			 */
+			ret = 0;
+			goto out_unlock;
+		}
+		newowner = argowner;
+	}
+
+	newtid = task_pid_vnr(newowner) | FUTEX_WAITERS;
+
 	if (get_futex_value_locked(&uval, uaddr))
 		goto handle_fault;
 
@@ -2434,9 +2472,9 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
 		 * Got the lock. We might not be the anticipated owner if we
 		 * did a lock-steal - fix up the PI-state in that case:
 		 *
-		 * We can safely read pi_state->owner without holding wait_lock
-		 * because we now own the rt_mutex, only the owner will attempt
-		 * to change it.
+		 * Speculative pi_state->owner read (we don't hold wait_lock);
+		 * since we own the lock pi_state->owner == current is the
+		 * stable state, anything else needs more attention.
 		 */
 		if (q->pi_state->owner != current)
 			ret = fixup_pi_state_owner(uaddr, q, current);
@@ -2444,6 +2482,19 @@ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked)
 	}
 
 	/*
+	 * If we didn't get the lock; check if anybody stole it from us. In
+	 * that case, we need to fix up the uval to point to them instead of
+	 * us, otherwise bad things happen. [10]
+	 *
+	 * Another speculative read; pi_state->owner == current is unstable
+	 * but needs our attention.
+	 */
+	if (q->pi_state->owner == current) {
+		ret = fixup_pi_state_owner(uaddr, q, NULL);
+		goto out;
+	}
+
+	/*
 	 * Paranoia check. If we did not take the lock, then we should not be
 	 * the owner of the rt_mutex.
 	 */
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 6f3dba6..65cc0cb 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1290,6 +1290,19 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state,
 	return ret;
 }
 
+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock)
+{
+	int ret = try_to_take_rt_mutex(lock, current, NULL);
+
+	/*
+	 * try_to_take_rt_mutex() sets the lock waiters bit
+	 * unconditionally. Clean this up.
+	 */
+	fixup_rt_mutex_waiters(lock);
+
+	return ret;
+}
+
 /*
  * Slow path try-lock function:
  */
@@ -1312,13 +1325,7 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock)
 	 */
 	raw_spin_lock_irqsave(&lock->wait_lock, flags);
 
-	ret = try_to_take_rt_mutex(lock, current, NULL);
-
-	/*
-	 * try_to_take_rt_mutex() sets the lock waiters bit
-	 * unconditionally. Clean this up.
-	 */
-	fixup_rt_mutex_waiters(lock);
+	ret = __rt_mutex_slowtrylock(lock);
 
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
 
@@ -1505,6 +1512,11 @@ int __sched rt_mutex_futex_trylock(struct rt_mutex *lock)
 	return rt_mutex_slowtrylock(lock);
 }
 
+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock)
+{
+	return __rt_mutex_slowtrylock(lock);
+}
+
 /**
  * rt_mutex_timed_lock - lock a rt_mutex interruptible
  *			the timeout structure is provided
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 124e98c..68686b3 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -148,6 +148,7 @@ extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
 				 struct rt_mutex_waiter *waiter);
 
 extern int rt_mutex_futex_trylock(struct rt_mutex *l);
+extern int __rt_mutex_futex_trylock(struct rt_mutex *l);
 
 extern void rt_mutex_futex_unlock(struct rt_mutex *lock);
 extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-01-14 18:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-29 17:56 PI futexes + lock stealing woes Julia Cartwright
2017-12-01 20:11 ` Darren Hart
2017-12-01 21:49   ` Julia Cartwright
2017-12-02  0:32     ` Darren Hart
2017-12-06 23:46 ` Peter Zijlstra
2017-12-07  2:09   ` Gratian Crisan
2017-12-07 10:45     ` Peter Zijlstra
2017-12-07 14:26       ` Peter Zijlstra
2017-12-07 14:57         ` Gratian Crisan
2017-12-07 19:50           ` Julia Cartwright
2017-12-07 23:02             ` Gratian Crisan
2017-12-08 12:49               ` [PATCH] futex: Avoid violating the 10th rule of futex Peter Zijlstra
2017-12-08 16:04                 ` Gratian Crisan
2018-01-08 21:09                 ` Julia Cartwright
2018-01-14 18:06                 ` [tip:locking/urgent] " tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.