All of lore.kernel.org
 help / color / mirror / Atom feed
From: Julia Cartwright <julia@ni.com>
To: Darren Hart <dvhart@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Gratian Crisan <gratian.crisan@ni.com>,
	<linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@redhat.com>
Subject: Re: PI futexes + lock stealing woes
Date: Fri, 1 Dec 2017 15:49:01 -0600	[thread overview]
Message-ID: <20171201214901.GB32696@jcartwri.amer.corp.natinst.com> (raw)
In-Reply-To: <20171201201115.GB18881@fury>

On Fri, Dec 01, 2017 at 12:11:15PM -0800, Darren Hart wrote:
> On Wed, Nov 29, 2017 at 11:56:05AM -0600, Julia Cartwright wrote:
> > Hey Thomas, Peter-
> > 
> > Gratian and I have been debugging into a nasty and difficult race w/
> > futexes seemingly the culprit.  The original symptom we were seeing
> > was a seemingly spurious -EDEADLK from a futex(LOCK_PI) operation.
> > 
> > On further analysis, however, it appears the thread which gets the
> > spurious -EDEADLK has observed a weird futex state: a prior
> > futex(WAIT_REQUEUE_PI) operation has returned -ETIMEDOUT, but the uaddr2
> > futex word owner field indicates that it's the owner.
> > 
> 
> Do you have a reproducer you can share?

We have a massive application which seems to reproduce it in 8 hours or
so, but it's not in a state to be shared :(.  So far, every attempt at
creating a simple, smaller reproducing case has failed.  We're still
trying, though :(.

One debugging technique we're trying to employ as well now that we think
we have a handle on the race is to pry the race window open with some
strategically placed spinning (or fixed-period sleeping).  Hopefully
that will make it easier to reproduce ...

> > Here's an attempt to boil down this situation into a pseudo trace; I'm
> > happy to forward along the full traces as well, if that would be
> > helpful:
> 
> Please do forward the full trace

Will do.  Chances are they are large enough to bounce from LKML, but
I'll send them along privately.

> > 
> >    waiter                                  waker                                            stealer (prio > waiter)
> > 
> >    futex(WAIT_REQUEUE_PI, uaddr, uaddr2,
> >          timeout=[N ms])
> >       futex_wait_requeue_pi()
> >          futex_wait_queue_me()
> >             freezable_schedule()
> >             <scheduled out>
> >                                            futex(LOCK_PI, uaddr2)
> >                                            futex(CMP_REQUEUE_PI, uaddr,
> >                                                  uaddr2, 1, 0)
> >                                               /* requeues waiter to uaddr2 */
> >                                            futex(UNLOCK_PI, uaddr2)
> >                                                  wake_futex_pi()
> >                                                     cmp_futex_value_locked(uaddr, waiter)

minor fix: the above should have been:                  cmp_futex_value_locked(uaddr2, waiter)

> >                                                     wake_up_q()
> >            <woken by waker>
> >            <hrtimer_wakeup() fires,
> >             clears sleeper->task>
> >                                                                                            futex(LOCK_PI, uaddr2)
> >                                                                                               __rt_mutex_start_proxy_lock()
> >                                                                                                  try_to_take_rt_mutex() /* steals lock */
> >                                                                                                     rt_mutex_set_owner(lock, stealer)
> >                                                                                               <preempted>
> >          <scheduled in>
> >          rt_mutex_wait_proxy_lock()
> >             __rt_mutex_slowlock()
> >                try_to_take_rt_mutex() /* fails, lock held by stealer */
> >                if (timeout && !timeout->task)
> >                   return -ETIMEDOUT;
> >             fixup_owner()
> >                /* lock wasn't acquired, so,
> >                   fixup_pi_state_owner skipped */
> >    return -ETIMEDOUT;
> >
> >    /* At this point, we've returned -ETIMEDOUT to userspace, but the
> >     * futex word shows waiter to be the owner, and the pi_mutex has
> >     * stealer as the owner */
>
> eeeeeeewwwweeee

Indeed. :(

> >    futex_lock(LOCK_PI, uaddr2)
> >      -> bails with EDEADLK, futex word says we're owner.
> > 
> > At some later point in execution, the stealer gets scheduled back in and
> > will do fixup_owner() which fixes up the futex word, but at that point
> > it's too late: the waiter has already observed the wonky state.
> > 
> > fixup_owner() used to have additional seemingly relevant checks in place
> > that were removed 73d786bd043eb ("futex: Rework inconsistent
> > rt_mutex/futex_q state").
> 
> This and the subsequent changes moving some of this out from under the hb->lock
> are interesting - and were quite fun to review at the time. Hrm.
> 
> I'll continue paging this stuff in, although I suspect Peter will likely beat me
> to it. In the meantime, if you can share the reproducer and/or the trace you
> collected, that will be helpful.
> 
> > The actual kernel we've been testing is 4.9.33-rt23, w/ 153fbd1226fb3
> > ("futex: Fix more put_pi_state() vs. exit_pi_state_list() races")
> 
> And this does not exhibit the behavior above, correct?

Sorry if I was unclear.  This combination _does_ exhibit this incorrect
behavior.

> > cherry-picked w/ PREEMPT_RT_FULL.  However, it appears that this issue
> > may affect v4.15-rc1?
>
> And this does?

I only meant that: as far as I can tell the affected codepaths are
mostly the same between v4.9.33-rt23 and v4.15-rc1, as the futex
reworking stuff was cherry-picked back.

We haven't yet tried reproducing on v4.15-rc1, and aren't really at a
place where we can do so quickly.  It's unclear whether or not
PREEMPT_RT is required to reproduce.

Thanks!
   Julia

  reply	other threads:[~2017-12-01 21:49 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-29 17:56 PI futexes + lock stealing woes Julia Cartwright
2017-12-01 20:11 ` Darren Hart
2017-12-01 21:49   ` Julia Cartwright [this message]
2017-12-02  0:32     ` Darren Hart
2017-12-06 23:46 ` Peter Zijlstra
2017-12-07  2:09   ` Gratian Crisan
2017-12-07 10:45     ` Peter Zijlstra
2017-12-07 14:26       ` Peter Zijlstra
2017-12-07 14:57         ` Gratian Crisan
2017-12-07 19:50           ` Julia Cartwright
2017-12-07 23:02             ` Gratian Crisan
2017-12-08 12:49               ` [PATCH] futex: Avoid violating the 10th rule of futex Peter Zijlstra
2017-12-08 16:04                 ` Gratian Crisan
2018-01-08 21:09                 ` Julia Cartwright
2018-01-14 18:06                 ` [tip:locking/urgent] " tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171201214901.GB32696@jcartwri.amer.corp.natinst.com \
    --to=julia@ni.com \
    --cc=dvhart@infradead.org \
    --cc=gratian.crisan@ni.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.