* [PATCH] Take over futex of dead task only if FUTEX_WAITERS is not set @ 2012-10-11 14:52 Siddhesh Poyarekar 2012-10-17 7:15 ` [PATCH RESEND] " Siddhesh Poyarekar 0 siblings, 1 reply; 13+ messages in thread From: Siddhesh Poyarekar @ 2012-10-11 14:52 UTC (permalink / raw) To: linux-kernel; +Cc: Thomas Gleixner, Darren Hart, Siddhesh Poyarekar In futex_lock_pi_atomic, we consider that if the value in the futex variable is 0 with additional flags, then it is safe for takeover since the owner of the futex is dead. However, when FUTEX_WAITERS is set in the futex value, handle_futex_death calls futex_wake to wake up one task. Hence the assumption in futex_lock_pi_atomic is not correct. The correct assumption is that a futex may be considered safe for a takeover if The FUTEX_OWNER_DIED bit is set, the TID bits are 0 and the FUTEX_WAITERS bit is not set. The race described above can be seen in the reproducer in the following glibc bug report: http://sourceware.org/bugzilla/show_bug.cgi?id=14076 Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- kernel/futex.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 3717e7b..9aa2d5a 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -760,9 +760,14 @@ retry: * case. We also do an unconditional take over, when the owner * of the futex died. * + * We do not take over the futex if FUTEX_WAITERS is set because we + * could end up waking two tasks, the current one and the one that the + * futex death event wakes in handle_futex_death. + * * This is safe as we are protected by the hash bucket lock ! */ - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { + if (unlikely(ownerdied || + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { /* Keep the OWNER_DIED bit */ newval = (curval & ~FUTEX_TID_MASK) | vpid; ownerdied = 0; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH RESEND] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-11 14:52 [PATCH] Take over futex of dead task only if FUTEX_WAITERS is not set Siddhesh Poyarekar @ 2012-10-17 7:15 ` Siddhesh Poyarekar 2012-10-22 3:20 ` [PATCH] [RESEND 2] " Siddhesh Poyarekar 0 siblings, 1 reply; 13+ messages in thread From: Siddhesh Poyarekar @ 2012-10-17 7:15 UTC (permalink / raw) To: linux-kernel; +Cc: Thomas Gleixner, Darren Hart In futex_lock_pi_atomic, we consider that if the value in the futex variable is 0 with additional flags, then it is safe for takeover since the owner of the futex is dead. However, when FUTEX_WAITERS is set in the futex value, handle_futex_death calls futex_wake to wake up one task. Hence the assumption in futex_lock_pi_atomic is not correct. The correct assumption is that a futex may be considered safe for a takeover if The FUTEX_OWNER_DIED bit is set, the TID bits are 0 and the FUTEX_WAITERS bit is not set. The race described above can be seen in the reproducer in the following glibc bug report: http://sourceware.org/bugzilla/show_bug.cgi?id=14076 Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- kernel/futex.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 3717e7b..9aa2d5a 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -760,9 +760,14 @@ retry: * case. We also do an unconditional take over, when the owner * of the futex died. * + * We do not take over the futex if FUTEX_WAITERS is set because we + * could end up waking two tasks, the current one and the one that the + * futex death event wakes in handle_futex_death. + * * This is safe as we are protected by the hash bucket lock ! */ - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { + if (unlikely(ownerdied || + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { /* Keep the OWNER_DIED bit */ newval = (curval & ~FUTEX_TID_MASK) | vpid; ownerdied = 0; -- 1.7.7.6 -- http://siddhesh.in ^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-17 7:15 ` [PATCH RESEND] " Siddhesh Poyarekar @ 2012-10-22 3:20 ` Siddhesh Poyarekar 2012-10-23 14:04 ` Darren Hart 0 siblings, 1 reply; 13+ messages in thread From: Siddhesh Poyarekar @ 2012-10-22 3:20 UTC (permalink / raw) To: linux-kernel; +Cc: Thomas Gleixner, Darren Hart, Siddhesh Poyarekar In futex_lock_pi_atomic, we consider that if the value in the futex variable is 0 with additional flags, then it is safe for takeover since the owner of the futex is dead. However, when FUTEX_WAITERS is set in the futex value, handle_futex_death calls futex_wake to wake up one task. Hence the assumption in futex_lock_pi_atomic is not correct. The correct assumption is that a futex may be considered safe for a takeover if The FUTEX_OWNER_DIED bit is set, the TID bits are 0 and the FUTEX_WAITERS bit is not set. The race described above can be seen in the reproducer in the following glibc bug report: http://sourceware.org/bugzilla/show_bug.cgi?id=14076 Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> --- kernel/futex.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 3717e7b..9aa2d5a 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -760,9 +760,14 @@ retry: * case. We also do an unconditional take over, when the owner * of the futex died. * + * We do not take over the futex if FUTEX_WAITERS is set because we + * could end up waking two tasks, the current one and the one that the + * futex death event wakes in handle_futex_death. + * * This is safe as we are protected by the hash bucket lock ! */ - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { + if (unlikely(ownerdied || + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { /* Keep the OWNER_DIED bit */ newval = (curval & ~FUTEX_TID_MASK) | vpid; ownerdied = 0; -- 1.7.7.6 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-22 3:20 ` [PATCH] [RESEND 2] " Siddhesh Poyarekar @ 2012-10-23 14:04 ` Darren Hart 2012-10-23 20:29 ` Thomas Gleixner 0 siblings, 1 reply; 13+ messages in thread From: Darren Hart @ 2012-10-23 14:04 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: linux-kernel, Thomas Gleixner Hi Siddesh, Thanks for the patch and your work to isolate it in the glibc bug 14076. On 10/21/2012 08:20 PM, Siddhesh Poyarekar wrote: > In futex_lock_pi_atomic, we consider that if the value in the futex > variable is 0 with additional flags, then it is safe for takeover > since the owner of the futex is dead. However, when FUTEX_WAITERS is > set in the futex value, handle_futex_death calls futex_wake to wake up > one task. It shouldn't for PI mutexes. It should just set the FUTEX_OWNER_DIED flag, maintaining the FUTEX_WAITERS flag, and exit. int handle_futex_death(... ... /* * Wake robust non-PI futexes here. The wakeup of * PI futexes happens in exit_pi_state(): */ if (!pi && (uval & FUTEX_WAITERS)) futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); There may still be an issue, as the commentary around exit_pi_state_list() doesn't inspire confidence. -- Darren > Hence the assumption in futex_lock_pi_atomic is not correct. > The correct assumption is that a futex may be considered safe for a > takeover if The FUTEX_OWNER_DIED bit is set, the TID bits are 0 and > the FUTEX_WAITERS bit is not set. > > The race described above can be seen in the reproducer in the > following glibc bug report: > > http://sourceware.org/bugzilla/show_bug.cgi?id=14076 > > Signed-off-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> > --- > kernel/futex.c | 7 ++++++- > 1 files changed, 6 insertions(+), 1 deletions(-) > > diff --git a/kernel/futex.c b/kernel/futex.c > index 3717e7b..9aa2d5a 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -760,9 +760,14 @@ retry: > * case. We also do an unconditional take over, when the owner > * of the futex died. > * > + * We do not take over the futex if FUTEX_WAITERS is set because we > + * could end up waking two tasks, the current one and the one that the > + * futex death event wakes in handle_futex_death. > + * > * This is safe as we are protected by the hash bucket lock ! > */ > - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { > + if (unlikely(ownerdied || > + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { > /* Keep the OWNER_DIED bit */ > newval = (curval & ~FUTEX_TID_MASK) | vpid; > ownerdied = 0; > -- Darren Hart Intel Open Source Technology Center Yocto Project - Technical Lead - Linux Kernel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-23 14:04 ` Darren Hart @ 2012-10-23 20:29 ` Thomas Gleixner 2012-10-24 12:48 ` Siddhesh Poyarekar ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Thomas Gleixner @ 2012-10-23 20:29 UTC (permalink / raw) To: Darren Hart; +Cc: Siddhesh Poyarekar, LKML, Ingo Molnar, Peter Zijlstra Darren, Siddhesh, On Tue, 23 Oct 2012, Darren Hart wrote: > Hi Siddesh, > > Thanks for the patch and your work to isolate it in the glibc bug 14076. > > On 10/21/2012 08:20 PM, Siddhesh Poyarekar wrote: > > In futex_lock_pi_atomic, we consider that if the value in the futex > > variable is 0 with additional flags, then it is safe for takeover > > since the owner of the futex is dead. However, when FUTEX_WAITERS is > > set in the futex value, handle_futex_death calls futex_wake to wake up > > one task. > > It shouldn't for PI mutexes. It should just set the FUTEX_OWNER_DIED flag, > maintaining the FUTEX_WAITERS flag, and exit. > > int handle_futex_death(... > ... > /* > * Wake robust non-PI futexes here. The wakeup of > * PI futexes happens in exit_pi_state(): > */ > if (!pi && (uval & FUTEX_WAITERS)) > futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); Yes, the description of the problem is slightly wrong, but it still pinpoints the real wreckage. > > Hence the assumption in futex_lock_pi_atomic is not correct. > > The correct assumption is that a futex may be considered safe for a > > takeover if The FUTEX_OWNER_DIED bit is set, the TID bits are 0 and > > the FUTEX_WAITERS bit is not set. ... > > - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { > > + if (unlikely(ownerdied || > > + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { This solves the problem at hand, but I'm not too happy with the solution. One of the real possible scenarios which expose the problem is: Futex F is initialized with PTHREAD_PRIO_INHERIT and PTHREAD_MUTEX_ROBUST_NP attributes. T1 lock_futex_pi(F); T2 lock_futex_pi(F); --> T2 blocks on the futex and creates pi_state which is associated to T1. T1 exits --> exit_robust_list() runs --> Futex F userspace value TID field is set to 0 and FUTEX_OWNER_DIED bit is set. T3 lock_futex_pi(F); --> Succeeds due to the check for F's userspace TID field == 0 --> Claims ownership of the futex and sets its own TID into the userspace TID field of futex F --> returns to user space T1 --> exit_pi_state_list() --> Transfers pi_state to waiter T2 and wakes T2 via rt_mutex_unlock(&pi_state->mutex) T2 --> acquires pi_state->mutex and gains real ownership of the pi_state --> Claims ownership of the futex and sets its own TID into the userspace TID field of futex F --> returns to user space T3 --> observes inconsistent state This problem is independent of UP/SMP, preemptible/non preemptible kernels, or process shared vs. private. The only difference is that certain configurations are more likely to expose it. So as Siddhesh correctly analyzed the following check in futex_lock_pi_atomic() is the culprit: if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { We check the userspace value for a TID value of 0 and take over the futex unconditionally if that's true. AFAICT this check is there as it is correct for a different corner case of futexes: the WAITERS bit became stale. Now the proposed change - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { + if (unlikely(ownerdied || + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { solves the problem, but it's not obvious why and it wreckages the "stale WAITERS bit" case. What happens is, that due to the WAITERS bit being set (T2 is blocked on that futex) it enforces T3 to go through lookup_pi_state(), which in the above case returns an existing pi_state and therefor forces T3 to legitimately fight with T2 over the ownership of the pi_state (via pi_state->mutex). Probelm solved! Though that does not work for the "WAITERS bit is stale" problem because if lookup_pi_state() does not find existing pi_state it returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to return -ESRCH to user space because the OWNER_DIED bit is not set. Now there is a different solution to that problem. Do not look at the user space value at all and enforce a lookup of possibly available pi_state. If pi_state can be found, then the new incoming locker T3 blocks on that pi_state and legitimately races with T2 to acquire the rt_mutex and the pi_state and therefor the proper ownership of the user space futex. lookup_pi_state() has the correct order of checks. It first tries to find a pi_state associated with the user space futex and only if that fails it checks for futex TID value = 0. If no pi_state is available nothing can create new state at that point because this happens with the hash bucket lock held. So the above scenario changes to: T1 lock_futex_pi(F); T2 lock_futex_pi(F); --> T2 blocks on the futex and creates pi_state which is associated to T1. T1 exits --> exit_robust_list() runs --> Futex F userspace value TID field is set to 0 and FUTEX_OWNER_DIED bit is set. T3 lock_futex_pi(F); --> Finds pi_state and blocks on pi_state->rt_mutex T1 --> exit_pi_state_list() --> Transfers pi_state to waiter T2 and wakes it via rt_mutex_unlock(&pi_state->mutex) T2 --> acquires pi_state->mutex and gains ownership of the pi_state --> Claims ownership of the futex and sets its own TID into the userspace TID field of futex F --> returns to user space This covers all gazillion points on which T3 might come in between T1's exit_robust_list() clearing the TID field and T2 fixing it up. It also solves the "WAITERS bit stale" problem by forcing the take over. Another benefit of changing the code this way is that it makes it less dependent on untrusted user space values and therefor minimizes the possible wreckage which might be inflicted. As usual after staring for too long at the futex code my brain hurts so much that I really want to ditch that whole optimization of avoiding the syscall for the non contended case for PI futexes and rip out the maze of corner case handling code. Unfortunately we can't as user space relies on that existing behaviour, but at least thinking about it helps me to preserve my mental sanity. Maybe we should nevertheless :) Thanks, tglx --------------> Index: linux/kernel/futex.c =================================================================== --- linux.orig/kernel/futex.c +++ linux/kernel/futex.c @@ -716,7 +716,7 @@ static int futex_lock_pi_atomic(u32 __us struct futex_pi_state **ps, struct task_struct *task, int set_waiters) { - int lock_taken, ret, ownerdied = 0; + int lock_taken, ret, force_take = 0; u32 uval, newval, curval, vpid = task_pid_vnr(task); retry: @@ -755,17 +755,15 @@ retry: newval = curval | FUTEX_WAITERS; /* - * There are two cases, where a futex might have no owner (the - * owner TID is 0): OWNER_DIED. We take over the futex in this - * case. We also do an unconditional take over, when the owner - * of the futex died. - * - * This is safe as we are protected by the hash bucket lock ! + * Should we force take the futex? See below. */ - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { - /* Keep the OWNER_DIED bit */ + if (unlikely(force_take)) { + /* + * Keep the OWNER_DIED and the WAITERS bit and set the + * new TID value. + */ newval = (curval & ~FUTEX_TID_MASK) | vpid; - ownerdied = 0; + force_take = 0; lock_taken = 1; } @@ -775,7 +773,7 @@ retry: goto retry; /* - * We took the lock due to owner died take over. + * We took the lock due to forced take over. */ if (unlikely(lock_taken)) return 1; @@ -790,20 +788,25 @@ retry: switch (ret) { case -ESRCH: /* - * No owner found for this futex. Check if the - * OWNER_DIED bit is set to figure out whether - * this is a robust futex or not. + * We failed to find an owner for this + * futex. So we have no pi_state to block + * on. This can happen in two cases: + * + * 1) The owner died + * 2) A stale FUTEX_WAITERS bit + * + * Re-read the futex value. */ if (get_futex_value_locked(&curval, uaddr)) return -EFAULT; /* - * We simply start over in case of a robust - * futex. The code above will take the futex - * and return happy. + * If the owner died or we have a stale + * WAITERS bit the owner TID in the user space + * futex is 0. */ - if (curval & FUTEX_OWNER_DIED) { - ownerdied = 1; + if (!(curval & FUTEX_TID_MASK)) { + force_take = 1; goto retry; } default: ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-23 20:29 ` Thomas Gleixner @ 2012-10-24 12:48 ` Siddhesh Poyarekar 2012-10-24 18:08 ` Thomas Gleixner 2012-10-25 4:33 ` Darren Hart 2012-11-01 21:35 ` [tip:core/urgent] futex: Handle futex_pi OWNER_DIED take over correctly tip-bot for Thomas Gleixner 2 siblings, 1 reply; 13+ messages in thread From: Siddhesh Poyarekar @ 2012-10-24 12:48 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Darren Hart, LKML, Ingo Molnar, Peter Zijlstra > Now there is a different solution to that problem. Do not look at the > user space value at all and enforce a lookup of possibly available > pi_state. If pi_state can be found, then the new incoming locker T3 > blocks on that pi_state and legitimately races with T2 to acquire the > rt_mutex and the pi_state and therefor the proper ownership of the > user space futex. That works. Thanks for the detailed explanation too. -- http://siddhesh.in ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-24 12:48 ` Siddhesh Poyarekar @ 2012-10-24 18:08 ` Thomas Gleixner 2012-10-25 4:36 ` Darren Hart 0 siblings, 1 reply; 13+ messages in thread From: Thomas Gleixner @ 2012-10-24 18:08 UTC (permalink / raw) To: Siddhesh Poyarekar; +Cc: Darren Hart, LKML, Ingo Molnar, Peter Zijlstra On Wed, 24 Oct 2012, Siddhesh Poyarekar wrote: > > Now there is a different solution to that problem. Do not look at the > > user space value at all and enforce a lookup of possibly available > > pi_state. If pi_state can be found, then the new incoming locker T3 > > blocks on that pi_state and legitimately races with T2 to acquire the > > rt_mutex and the pi_state and therefor the proper ownership of the > > user space futex. > > That works. Thanks for the detailed explanation too. Thanks for the reproducer and finding the trouble spot in the first place! I'll queue that if Darren has no objections and mark it for stable as well. Thanks, tglx ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-24 18:08 ` Thomas Gleixner @ 2012-10-25 4:36 ` Darren Hart 2012-10-25 4:44 ` Siddhesh Poyarekar 0 siblings, 1 reply; 13+ messages in thread From: Darren Hart @ 2012-10-25 4:36 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Siddhesh Poyarekar, LKML, Ingo Molnar, Peter Zijlstra On 10/24/2012 11:08 AM, Thomas Gleixner wrote: > On Wed, 24 Oct 2012, Siddhesh Poyarekar wrote: > >>> Now there is a different solution to that problem. Do not look at the >>> user space value at all and enforce a lookup of possibly available >>> pi_state. If pi_state can be found, then the new incoming locker T3 >>> blocks on that pi_state and legitimately races with T2 to acquire the >>> rt_mutex and the pi_state and therefor the proper ownership of the >>> user space futex. >> >> That works. Thanks for the detailed explanation too. > > Thanks for the reproducer and finding the trouble spot in the first > place! Absolutely, that was great. Siddhesh, any objection to this test being incorporated into futextest? http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git;a=summary > I'll queue that if Darren has no objections and mark it for stable as > well. I would mostly like to understand the stale waiters case you mentioned. Otherwise, it seems sound - but changing what appears to be a workaround for an undocumented cornercase in code this complex does make me a bit nervous. I'd feel better if we could get Siddhesh's and this stale waiters covered in futextest. -- Darren Hart Intel Open Source Technology Center Yocto Project - Technical Lead - Linux Kernel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-25 4:36 ` Darren Hart @ 2012-10-25 4:44 ` Siddhesh Poyarekar 0 siblings, 0 replies; 13+ messages in thread From: Siddhesh Poyarekar @ 2012-10-25 4:44 UTC (permalink / raw) To: Darren Hart; +Cc: Thomas Gleixner, LKML, Ingo Molnar, Peter Zijlstra On 25 October 2012 10:06, Darren Hart <dvhart@linux.intel.com> wrote: > Absolutely, that was great. Siddhesh, any objection to this test being > incorporated into futextest? > > http://git.kernel.org/?p=linux/kernel/git/dvhart/futextest.git;a=summary > I have no objection to the test being incorporated into futextest. Thanks, Siddhesh -- http://siddhesh.in ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-23 20:29 ` Thomas Gleixner 2012-10-24 12:48 ` Siddhesh Poyarekar @ 2012-10-25 4:33 ` Darren Hart 2012-10-25 8:14 ` Thomas Gleixner 2012-11-01 21:35 ` [tip:core/urgent] futex: Handle futex_pi OWNER_DIED take over correctly tip-bot for Thomas Gleixner 2 siblings, 1 reply; 13+ messages in thread From: Darren Hart @ 2012-10-25 4:33 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Siddhesh Poyarekar, LKML, Ingo Molnar, Peter Zijlstra On 10/23/2012 01:29 PM, Thomas Gleixner wrote: > Darren, Siddhesh, > > On Tue, 23 Oct 2012, Darren Hart wrote: > >> Hi Siddesh, >> >> Thanks for the patch and your work to isolate it in the glibc bug 14076. >> >> On 10/21/2012 08:20 PM, Siddhesh Poyarekar wrote: >>> In futex_lock_pi_atomic, we consider that if the value in the futex >>> variable is 0 with additional flags, then it is safe for takeover >>> since the owner of the futex is dead. However, when FUTEX_WAITERS is >>> set in the futex value, handle_futex_death calls futex_wake to wake up >>> one task. >> >> It shouldn't for PI mutexes. It should just set the FUTEX_OWNER_DIED flag, >> maintaining the FUTEX_WAITERS flag, and exit. >> >> int handle_futex_death(... >> ... >> /* >> * Wake robust non-PI futexes here. The wakeup of >> * PI futexes happens in exit_pi_state(): >> */ >> if (!pi && (uval & FUTEX_WAITERS)) >> futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY); > > Yes, the description of the problem is slightly wrong, but it still > pinpoints the real wreckage. > >>> Hence the assumption in futex_lock_pi_atomic is not correct. >>> The correct assumption is that a futex may be considered safe for a >>> takeover if The FUTEX_OWNER_DIED bit is set, the TID bits are 0 and >>> the FUTEX_WAITERS bit is not set. > ... >>> - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { >>> + if (unlikely(ownerdied || >>> + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { > > This solves the problem at hand, but I'm not too happy with the > solution. One of the real possible scenarios which expose the problem > is: > > Futex F is initialized with PTHREAD_PRIO_INHERIT and > PTHREAD_MUTEX_ROBUST_NP attributes. > > T1 lock_futex_pi(F); > > T2 lock_futex_pi(F); > > --> T2 blocks on the futex and creates pi_state which is associated > to T1. > > T1 exits > > --> exit_robust_list() runs > > --> Futex F userspace value TID field is set to 0 and > FUTEX_OWNER_DIED bit is set. > > T3 lock_futex_pi(F); > > --> Succeeds due to the check for F's userspace TID field == 0 > > --> Claims ownership of the futex and sets its own TID into the > userspace TID field of futex F > > --> returns to user space > > T1 --> exit_pi_state_list() > > --> Transfers pi_state to waiter T2 and wakes T2 via > rt_mutex_unlock(&pi_state->mutex) > > T2 --> acquires pi_state->mutex and gains real ownership of the > pi_state > > --> Claims ownership of the futex and sets its own TID into the > userspace TID field of futex F > > --> returns to user space > > T3 --> observes inconsistent state > > This problem is independent of UP/SMP, preemptible/non preemptible > kernels, or process shared vs. private. The only difference is that > certain configurations are more likely to expose it. > > So as Siddhesh correctly analyzed the following check in > futex_lock_pi_atomic() is the culprit: > > if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { > > We check the userspace value for a TID value of 0 and take over the > futex unconditionally if that's true. > > AFAICT this check is there as it is correct for a different corner > case of futexes: the WAITERS bit became stale. > > Now the proposed change > > - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { > + if (unlikely(ownerdied || > + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { > > solves the problem, but it's not obvious why and it wreckages the > "stale WAITERS bit" case. In what scenario does the WAITERS bit become stale for pi futexes? This corner case seems rather core to your solution, so I would like to understand it a bit better. > > What happens is, that due to the WAITERS bit being set (T2 is blocked > on that futex) it enforces T3 to go through lookup_pi_state(), which > in the above case returns an existing pi_state and therefor forces T3 > to legitimately fight with T2 over the ownership of the pi_state (via > pi_state->mutex). Probelm solved! > > Though that does not work for the "WAITERS bit is stale" problem > because if lookup_pi_state() does not find existing pi_state it > returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to > return -ESRCH to user space because the OWNER_DIED bit is not set. > > Now there is a different solution to that problem. Do not look at the > user space value at all and enforce a lookup of possibly available > pi_state. If pi_state can be found, then the new incoming locker T3 > blocks on that pi_state and legitimately races with T2 to acquire the > rt_mutex and the pi_state and therefor the proper ownership of the > user space futex. My first concern here is performance impact by forcing the pi_state lookup, however, if we got this far, we already took the syscall, and our performance sucks anyway. Correctness obviously trumps performance here. > > lookup_pi_state() has the correct order of checks. It first tries to > find a pi_state associated with the user space futex and only if that > fails it checks for futex TID value = 0. If no pi_state is available > nothing can create new state at that point because this happens with > the hash bucket lock held. > > So the above scenario changes to: > > T1 lock_futex_pi(F); > > T2 lock_futex_pi(F); > > --> T2 blocks on the futex and creates pi_state which is associated > to T1. > > T1 exits > > --> exit_robust_list() runs > > --> Futex F userspace value TID field is set to 0 and > FUTEX_OWNER_DIED bit is set. > > T3 lock_futex_pi(F); > > --> Finds pi_state and blocks on pi_state->rt_mutex > > T1 --> exit_pi_state_list() > > --> Transfers pi_state to waiter T2 and wakes it via > rt_mutex_unlock(&pi_state->mutex) > > T2 --> acquires pi_state->mutex and gains ownership of the pi_state > > --> Claims ownership of the futex and sets its own TID into the > userspace TID field of futex F > > --> returns to user space > > This covers all gazillion points on which T3 might come in between > T1's exit_robust_list() clearing the TID field and T2 fixing it up. It > also solves the "WAITERS bit stale" problem by forcing the take over. > > Another benefit of changing the code this way is that it makes it less > dependent on untrusted user space values and therefor minimizes the > possible wreckage which might be inflicted. That's a definite plus! > As usual after staring for too long at the futex code my brain hurts > so much that I really want to ditch that whole optimization of > avoiding the syscall for the non contended case for PI futexes and rip > out the maze of corner case handling code. Unfortunately we can't as > user space relies on that existing behaviour, but at least thinking > about it helps me to preserve my mental sanity. Maybe we should > nevertheless :) I was surprised at how fast you were able to page all this in after all that travel - or is this what you did for 12 hours on the plane? :-) -- Darren > Thanks, > > tglx > > --------------> > > Index: linux/kernel/futex.c > =================================================================== > --- linux.orig/kernel/futex.c > +++ linux/kernel/futex.c > @@ -716,7 +716,7 @@ static int futex_lock_pi_atomic(u32 __us > struct futex_pi_state **ps, > struct task_struct *task, int set_waiters) > { > - int lock_taken, ret, ownerdied = 0; > + int lock_taken, ret, force_take = 0; > u32 uval, newval, curval, vpid = task_pid_vnr(task); > > retry: > @@ -755,17 +755,15 @@ retry: > newval = curval | FUTEX_WAITERS; > > /* > - * There are two cases, where a futex might have no owner (the > - * owner TID is 0): OWNER_DIED. We take over the futex in this > - * case. We also do an unconditional take over, when the owner > - * of the futex died. > - * > - * This is safe as we are protected by the hash bucket lock ! > + * Should we force take the futex? See below. > */ > - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { > - /* Keep the OWNER_DIED bit */ > + if (unlikely(force_take)) { > + /* > + * Keep the OWNER_DIED and the WAITERS bit and set the > + * new TID value. > + */ > newval = (curval & ~FUTEX_TID_MASK) | vpid; > - ownerdied = 0; > + force_take = 0; > lock_taken = 1; > } > > @@ -775,7 +773,7 @@ retry: > goto retry; > > /* > - * We took the lock due to owner died take over. > + * We took the lock due to forced take over. > */ > if (unlikely(lock_taken)) > return 1; > @@ -790,20 +788,25 @@ retry: > switch (ret) { > case -ESRCH: > /* > - * No owner found for this futex. Check if the > - * OWNER_DIED bit is set to figure out whether > - * this is a robust futex or not. > + * We failed to find an owner for this > + * futex. So we have no pi_state to block > + * on. This can happen in two cases: > + * > + * 1) The owner died > + * 2) A stale FUTEX_WAITERS bit > + * > + * Re-read the futex value. > */ > if (get_futex_value_locked(&curval, uaddr)) > return -EFAULT; > > /* > - * We simply start over in case of a robust > - * futex. The code above will take the futex > - * and return happy. > + * If the owner died or we have a stale > + * WAITERS bit the owner TID in the user space > + * futex is 0. > */ > - if (curval & FUTEX_OWNER_DIED) { > - ownerdied = 1; > + if (!(curval & FUTEX_TID_MASK)) { > + force_take = 1; > goto retry; > } > default: > -- Darren Hart Intel Open Source Technology Center Yocto Project - Technical Lead - Linux Kernel ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-25 4:33 ` Darren Hart @ 2012-10-25 8:14 ` Thomas Gleixner 2012-10-25 8:18 ` Darren Hart 0 siblings, 1 reply; 13+ messages in thread From: Thomas Gleixner @ 2012-10-25 8:14 UTC (permalink / raw) To: Darren Hart; +Cc: Siddhesh Poyarekar, LKML, Ingo Molnar, Peter Zijlstra On Wed, 24 Oct 2012, Darren Hart wrote: > On 10/23/2012 01:29 PM, Thomas Gleixner wrote: > > Now the proposed change > > > > - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { > > + if (unlikely(ownerdied || > > + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { > > > > solves the problem, but it's not obvious why and it wreckages the > > "stale WAITERS bit" case. > > > In what scenario does the WAITERS bit become stale for pi futexes? This > corner case seems rather core to your solution, so I would like to > understand it a bit better. Hmm. I can't reconstruct the scenario anymore, though this has been an issue some time ago. Either we fixed up that case in course of the big restructuring of the code or it's just there and I cant decode it. Brain hurts. > > Now there is a different solution to that problem. Do not look at the > > user space value at all and enforce a lookup of possibly available > > pi_state. If pi_state can be found, then the new incoming locker T3 > > blocks on that pi_state and legitimately races with T2 to acquire the > > rt_mutex and the pi_state and therefor the proper ownership of the > > user space futex. > > My first concern here is performance impact by forcing the pi_state > lookup, however, if we got this far, we already took the syscall, and > our performance sucks anyway. Correctness obviously trumps performance here. Right, the pi_state lookup is cheap compared to the syscall, locking .... And even without the stale WAITERS bit issue, the bit below is a good reason to do it this way. > > Another benefit of changing the code this way is that it makes it less > > dependent on untrusted user space values and therefor minimizes the > > possible wreckage which might be inflicted. > > That's a definite plus! Indeed. > I was surprised at how fast you were able to page all this in after all > that travel - or is this what you did for 12 hours on the plane? Nah. I'm just too paranoid to apply any futex patch w/o understanding the root cause of it. Darn, if I only could remember how that stale waiters bit issue got inflicted .... Thanks, tglx ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] [RESEND 2] Take over futex of dead task only if FUTEX_WAITERS is not set 2012-10-25 8:14 ` Thomas Gleixner @ 2012-10-25 8:18 ` Darren Hart 0 siblings, 0 replies; 13+ messages in thread From: Darren Hart @ 2012-10-25 8:18 UTC (permalink / raw) To: Thomas Gleixner; +Cc: Siddhesh Poyarekar, LKML, Ingo Molnar, Peter Zijlstra On 10/25/2012 01:14 AM, Thomas Gleixner wrote: > On Wed, 24 Oct 2012, Darren Hart wrote: >> On 10/23/2012 01:29 PM, Thomas Gleixner wrote: > Nah. I'm just too paranoid to apply any futex patch w/o understanding > the root cause of it. Darn, if I only could remember how that stale > waiters bit issue got inflicted .... Stale waiters happens in userspace for non-pi mutexes and there we make one extra FUTEX_WAKE syscall just in case. The futex value policy is much more rigid with PI obviously, so I wasn't aware this could happen there. -- Darren Hart Intel Open Source Technology Center Yocto Project - Technical Lead - Linux Kernel ^ permalink raw reply [flat|nested] 13+ messages in thread
* [tip:core/urgent] futex: Handle futex_pi OWNER_DIED take over correctly 2012-10-23 20:29 ` Thomas Gleixner 2012-10-24 12:48 ` Siddhesh Poyarekar 2012-10-25 4:33 ` Darren Hart @ 2012-11-01 21:35 ` tip-bot for Thomas Gleixner 2 siblings, 0 replies; 13+ messages in thread From: tip-bot for Thomas Gleixner @ 2012-11-01 21:35 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, dvhart, siddhesh.poyarekar, tglx Commit-ID: 59fa6245192159ab5e1e17b8e31f15afa9cff4bf Gitweb: http://git.kernel.org/tip/59fa6245192159ab5e1e17b8e31f15afa9cff4bf Author: Thomas Gleixner <tglx@linutronix.de> AuthorDate: Tue, 23 Oct 2012 22:29:38 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 1 Nov 2012 12:06:54 +0100 futex: Handle futex_pi OWNER_DIED take over correctly Siddhesh analyzed a failure in the take over of pi futexes in case the owner died and provided a workaround. See: http://sourceware.org/bugzilla/show_bug.cgi?id=14076 The detailed problem analysis shows: Futex F is initialized with PTHREAD_PRIO_INHERIT and PTHREAD_MUTEX_ROBUST_NP attributes. T1 lock_futex_pi(F); T2 lock_futex_pi(F); --> T2 blocks on the futex and creates pi_state which is associated to T1. T1 exits --> exit_robust_list() runs --> Futex F userspace value TID field is set to 0 and FUTEX_OWNER_DIED bit is set. T3 lock_futex_pi(F); --> Succeeds due to the check for F's userspace TID field == 0 --> Claims ownership of the futex and sets its own TID into the userspace TID field of futex F --> returns to user space T1 --> exit_pi_state_list() --> Transfers pi_state to waiter T2 and wakes T2 via rt_mutex_unlock(&pi_state->mutex) T2 --> acquires pi_state->mutex and gains real ownership of the pi_state --> Claims ownership of the futex and sets its own TID into the userspace TID field of futex F --> returns to user space T3 --> observes inconsistent state This problem is independent of UP/SMP, preemptible/non preemptible kernels, or process shared vs. private. The only difference is that certain configurations are more likely to expose it. So as Siddhesh correctly analyzed the following check in futex_lock_pi_atomic() is the culprit: if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { We check the userspace value for a TID value of 0 and take over the futex unconditionally if that's true. AFAICT this check is there as it is correct for a different corner case of futexes: the WAITERS bit became stale. Now the proposed change - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { + if (unlikely(ownerdied || + !(curval & (FUTEX_TID_MASK | FUTEX_WAITERS)))) { solves the problem, but it's not obvious why and it wreckages the "stale WAITERS bit" case. What happens is, that due to the WAITERS bit being set (T2 is blocked on that futex) it enforces T3 to go through lookup_pi_state(), which in the above case returns an existing pi_state and therefor forces T3 to legitimately fight with T2 over the ownership of the pi_state (via pi_state->mutex). Probelm solved! Though that does not work for the "WAITERS bit is stale" problem because if lookup_pi_state() does not find existing pi_state it returns -ERSCH (due to TID == 0) which causes futex_lock_pi() to return -ESRCH to user space because the OWNER_DIED bit is not set. Now there is a different solution to that problem. Do not look at the user space value at all and enforce a lookup of possibly available pi_state. If pi_state can be found, then the new incoming locker T3 blocks on that pi_state and legitimately races with T2 to acquire the rt_mutex and the pi_state and therefor the proper ownership of the user space futex. lookup_pi_state() has the correct order of checks. It first tries to find a pi_state associated with the user space futex and only if that fails it checks for futex TID value = 0. If no pi_state is available nothing can create new state at that point because this happens with the hash bucket lock held. So the above scenario changes to: T1 lock_futex_pi(F); T2 lock_futex_pi(F); --> T2 blocks on the futex and creates pi_state which is associated to T1. T1 exits --> exit_robust_list() runs --> Futex F userspace value TID field is set to 0 and FUTEX_OWNER_DIED bit is set. T3 lock_futex_pi(F); --> Finds pi_state and blocks on pi_state->rt_mutex T1 --> exit_pi_state_list() --> Transfers pi_state to waiter T2 and wakes it via rt_mutex_unlock(&pi_state->mutex) T2 --> acquires pi_state->mutex and gains ownership of the pi_state --> Claims ownership of the futex and sets its own TID into the userspace TID field of futex F --> returns to user space This covers all gazillion points on which T3 might come in between T1's exit_robust_list() clearing the TID field and T2 fixing it up. It also solves the "WAITERS bit stale" problem by forcing the take over. Another benefit of changing the code this way is that it makes it less dependent on untrusted user space values and therefor minimizes the possible wreckage which might be inflicted. As usual after staring for too long at the futex code my brain hurts so much that I really want to ditch that whole optimization of avoiding the syscall for the non contended case for PI futexes and rip out the maze of corner case handling code. Unfortunately we can't as user space relies on that existing behaviour, but at least thinking about it helps me to preserve my mental sanity. Maybe we should nevertheless :) Reported-and-tested-by: Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com> Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1210232138540.2756@ionos Acked-by: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/futex.c | 41 ++++++++++++++++++++++------------------- 1 files changed, 22 insertions(+), 19 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 3717e7b..20ef219 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -716,7 +716,7 @@ static int futex_lock_pi_atomic(u32 __user *uaddr, struct futex_hash_bucket *hb, struct futex_pi_state **ps, struct task_struct *task, int set_waiters) { - int lock_taken, ret, ownerdied = 0; + int lock_taken, ret, force_take = 0; u32 uval, newval, curval, vpid = task_pid_vnr(task); retry: @@ -755,17 +755,15 @@ retry: newval = curval | FUTEX_WAITERS; /* - * There are two cases, where a futex might have no owner (the - * owner TID is 0): OWNER_DIED. We take over the futex in this - * case. We also do an unconditional take over, when the owner - * of the futex died. - * - * This is safe as we are protected by the hash bucket lock ! + * Should we force take the futex? See below. */ - if (unlikely(ownerdied || !(curval & FUTEX_TID_MASK))) { - /* Keep the OWNER_DIED bit */ + if (unlikely(force_take)) { + /* + * Keep the OWNER_DIED and the WAITERS bit and set the + * new TID value. + */ newval = (curval & ~FUTEX_TID_MASK) | vpid; - ownerdied = 0; + force_take = 0; lock_taken = 1; } @@ -775,7 +773,7 @@ retry: goto retry; /* - * We took the lock due to owner died take over. + * We took the lock due to forced take over. */ if (unlikely(lock_taken)) return 1; @@ -790,20 +788,25 @@ retry: switch (ret) { case -ESRCH: /* - * No owner found for this futex. Check if the - * OWNER_DIED bit is set to figure out whether - * this is a robust futex or not. + * We failed to find an owner for this + * futex. So we have no pi_state to block + * on. This can happen in two cases: + * + * 1) The owner died + * 2) A stale FUTEX_WAITERS bit + * + * Re-read the futex value. */ if (get_futex_value_locked(&curval, uaddr)) return -EFAULT; /* - * We simply start over in case of a robust - * futex. The code above will take the futex - * and return happy. + * If the owner died or we have a stale + * WAITERS bit the owner TID in the user space + * futex is 0. */ - if (curval & FUTEX_OWNER_DIED) { - ownerdied = 1; + if (!(curval & FUTEX_TID_MASK)) { + force_take = 1; goto retry; } default: ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2012-11-01 21:35 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-10-11 14:52 [PATCH] Take over futex of dead task only if FUTEX_WAITERS is not set Siddhesh Poyarekar 2012-10-17 7:15 ` [PATCH RESEND] " Siddhesh Poyarekar 2012-10-22 3:20 ` [PATCH] [RESEND 2] " Siddhesh Poyarekar 2012-10-23 14:04 ` Darren Hart 2012-10-23 20:29 ` Thomas Gleixner 2012-10-24 12:48 ` Siddhesh Poyarekar 2012-10-24 18:08 ` Thomas Gleixner 2012-10-25 4:36 ` Darren Hart 2012-10-25 4:44 ` Siddhesh Poyarekar 2012-10-25 4:33 ` Darren Hart 2012-10-25 8:14 ` Thomas Gleixner 2012-10-25 8:18 ` Darren Hart 2012-11-01 21:35 ` [tip:core/urgent] futex: Handle futex_pi OWNER_DIED take over correctly tip-bot for Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).