All of lore.kernel.org
 help / color / mirror / Atom feed
From: "tip-bot2 for Mike Galbraith" <tip-bot2@linutronix.de>
To: linux-tip-commits@vger.kernel.org
Cc: Gratian Crisan <gratian.crisan@ni.com>,
	Mike Galbraith <efault@gmx.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	stable@vger.kernel.org, x86@kernel.org,
	linux-kernel@vger.kernel.org
Subject: [tip: locking/urgent] futex: Handle transient "ownerless" rtmutex state correctly
Date: Sat, 07 Nov 2020 21:08:21 -0000	[thread overview]
Message-ID: <160478330115.11244.17107406724104832304.tip-bot2@tip-bot2> (raw)
In-Reply-To: <87a6w6x7bb.fsf@ni.com>

The following commit has been merged into the locking/urgent branch of tip:

Commit-ID:     9f5d1c336a10c0d24e83e40b4c1b9539f7dba627
Gitweb:        https://git.kernel.org/tip/9f5d1c336a10c0d24e83e40b4c1b9539f7dba627
Author:        Mike Galbraith <efault@gmx.de>
AuthorDate:    Wed, 04 Nov 2020 16:12:44 +01:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Sat, 07 Nov 2020 22:07:04 +01:00

futex: Handle transient "ownerless" rtmutex state correctly

Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:

Task Prio       Operation
T1   120	lock(F)
T2   120	lock(F)   -> blocks (top waiter)
T3   50 (RT)	lock(F)   -> boosts T1 and blocks (new top waiter)
XX   		timeout/  -> wakes T2
		signal
T1   50		unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2   120	cleanup   -> try_to_take_mutex() fails because T3 is the top waiter
     			     and the lower priority T2 cannot steal the lock.
     			  -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()

The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.

The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().

Drop the locks, so that T3 can make progress, and then try the fixup again.

Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.

[ tglx: Wrote comment and changelog ]

Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com
Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
---
 kernel/futex.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index f8614ef..ac32887 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2380,10 +2380,22 @@ retry:
 		}
 
 		/*
-		 * Since we just failed the trylock; there must be an owner.
+		 * The trylock just failed, so either there is an owner or
+		 * there is a higher priority waiter than this one.
 		 */
 		newowner = rt_mutex_owner(&pi_state->pi_mutex);
-		BUG_ON(!newowner);
+		/*
+		 * If the higher priority waiter has not yet taken over the
+		 * rtmutex then newowner is NULL. We can't return here with
+		 * that state because it's inconsistent vs. the user space
+		 * state. So drop the locks and try again. It's a valid
+		 * situation and not any different from the other retry
+		 * conditions.
+		 */
+		if (unlikely(!newowner)) {
+			err = -EAGAIN;
+			goto handle_err;
+		}
 	} else {
 		WARN_ON_ONCE(argowner != current);
 		if (oldowner == current) {

      parent reply	other threads:[~2020-11-07 21:08 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-28 23:24 BUG_ON(!newowner) in fixup_pi_state_owner() Gratian Crisan
2020-11-03 23:31 ` Gratian Crisan
2020-11-04  0:56   ` Mike Galbraith
2020-11-04  7:42     ` Mike Galbraith
2020-11-04 10:24       ` Thomas Gleixner
2020-11-04 11:17         ` Thomas Gleixner
2020-11-04 13:26           ` Thomas Gleixner
2020-11-04 13:43             ` Mike Galbraith
2020-11-04 15:12               ` [PATCH] futex: Handle transient "ownerless" rtmutex state correctly Thomas Gleixner
2020-11-04 15:22                 ` Thomas Gleixner
2020-11-04 16:50                 ` Mike Galbraith
2020-11-05  6:32                 ` Gratian Crisan
2020-11-04 10:00   ` BUG_ON(!newowner) in fixup_pi_state_owner() Thomas Gleixner
2020-11-06 21:26 ` [tip: locking/urgent] futex: Handle transient "ownerless" rtmutex state correctly tip-bot2 for Mike Galbraith
2020-11-07 17:05   ` Mike Galbraith
2020-11-07 21:07     ` Thomas Gleixner
2020-11-07 21:08 ` tip-bot2 for Mike Galbraith [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=160478330115.11244.17107406724104832304.tip-bot2@tip-bot2 \
    --to=tip-bot2@linutronix.de \
    --cc=efault@gmx.de \
    --cc=gratian.crisan@ni.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-tip-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.