From: "tip-bot2 for Mike Galbraith" <tip-bot2@linutronix.de>
To: linux-tip-commits@vger.kernel.org
Cc: Gratian Crisan <gratian.crisan@ni.com>,
Mike Galbraith <efault@gmx.de>,
Thomas Gleixner <tglx@linutronix.de>,
stable@vger.kernel.org, x86@kernel.org,
linux-kernel@vger.kernel.org
Subject: [tip: locking/urgent] futex: Handle transient "ownerless" rtmutex state correctly
Date: Sat, 07 Nov 2020 21:08:21 -0000 [thread overview]
Message-ID: <160478330115.11244.17107406724104832304.tip-bot2@tip-bot2> (raw)
In-Reply-To: <87a6w6x7bb.fsf@ni.com>
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: 9f5d1c336a10c0d24e83e40b4c1b9539f7dba627
Gitweb: https://git.kernel.org/tip/9f5d1c336a10c0d24e83e40b4c1b9539f7dba627
Author: Mike Galbraith <efault@gmx.de>
AuthorDate: Wed, 04 Nov 2020 16:12:44 +01:00
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 07 Nov 2020 22:07:04 +01:00
futex: Handle transient "ownerless" rtmutex state correctly
Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:
Task Prio Operation
T1 120 lock(F)
T2 120 lock(F) -> blocks (top waiter)
T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter)
XX timeout/ -> wakes T2
signal
T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter
and the lower priority T2 cannot steal the lock.
-> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()
The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.
The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().
Drop the locks, so that T3 can make progress, and then try the fixup again.
Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.
[ tglx: Wrote comment and changelog ]
Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com
Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
---
kernel/futex.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index f8614ef..ac32887 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2380,10 +2380,22 @@ retry:
}
/*
- * Since we just failed the trylock; there must be an owner.
+ * The trylock just failed, so either there is an owner or
+ * there is a higher priority waiter than this one.
*/
newowner = rt_mutex_owner(&pi_state->pi_mutex);
- BUG_ON(!newowner);
+ /*
+ * If the higher priority waiter has not yet taken over the
+ * rtmutex then newowner is NULL. We can't return here with
+ * that state because it's inconsistent vs. the user space
+ * state. So drop the locks and try again. It's a valid
+ * situation and not any different from the other retry
+ * conditions.
+ */
+ if (unlikely(!newowner)) {
+ err = -EAGAIN;
+ goto handle_err;
+ }
} else {
WARN_ON_ONCE(argowner != current);
if (oldowner == current) {
prev parent reply other threads:[~2020-11-07 21:08 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-28 23:24 BUG_ON(!newowner) in fixup_pi_state_owner() Gratian Crisan
2020-11-03 23:31 ` Gratian Crisan
2020-11-04 0:56 ` Mike Galbraith
2020-11-04 7:42 ` Mike Galbraith
2020-11-04 10:24 ` Thomas Gleixner
2020-11-04 11:17 ` Thomas Gleixner
2020-11-04 13:26 ` Thomas Gleixner
2020-11-04 13:43 ` Mike Galbraith
2020-11-04 15:12 ` [PATCH] futex: Handle transient "ownerless" rtmutex state correctly Thomas Gleixner
2020-11-04 15:22 ` Thomas Gleixner
2020-11-04 16:50 ` Mike Galbraith
2020-11-05 6:32 ` Gratian Crisan
2020-11-04 10:00 ` BUG_ON(!newowner) in fixup_pi_state_owner() Thomas Gleixner
2020-11-06 21:26 ` [tip: locking/urgent] futex: Handle transient "ownerless" rtmutex state correctly tip-bot2 for Mike Galbraith
2020-11-07 17:05 ` Mike Galbraith
2020-11-07 21:07 ` Thomas Gleixner
2020-11-07 21:08 ` tip-bot2 for Mike Galbraith [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=160478330115.11244.17107406724104832304.tip-bot2@tip-bot2 \
--to=tip-bot2@linutronix.de \
--cc=efault@gmx.de \
--cc=gratian.crisan@ni.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tip-commits@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.