All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gratian Crisan <gratian.crisan@ni.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Mike Galbraith <efault@gmx.de>,
	Gratian Crisan <gratian.crisan@ni.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org,
	Brandon Streiff <brandon.streiff@ni.com>,
	Ingo Molnar <mingo@redhat.com>,
	Darren Hart <dvhart@infradead.org>,
	James Minor <james.minor@ni.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [PATCH] futex: Handle transient "ownerless" rtmutex state correctly
Date: Thu, 05 Nov 2020 00:32:29 -0600	[thread overview]
Message-ID: <877dr0xqiq.fsf@ni.com> (raw)
In-Reply-To: <87sg9pkvf7.fsf@nanos.tec.linutronix.de>


Thomas Gleixner writes:

> From: Mike Galbraith <efault@gmx.de>
>
> Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
> This is one possible chain of events leading to this:
>
> Task Prio       Operation
> T1   120	lock(F)
> T2   120	lock(F)   -> blocks (top waiter)
> T3   50 (RT)	lock(F)   -> boosts T3 and blocks (new top waiter)
> XX   		timeout/  -> wakes T2
> 		signal
> T1   50		unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
> T2   120	cleanup   -> try_to_take_mutex() fails because T3 is the top waiter
>      			     and the lower priority T2 cannot steal the lock.
>      			  -> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()
>
> The comment states that this is invalid and rt_mutex_real_owner() must
> return a non NULL owner when the trylock failed, but in case of a queued
> and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
> state. The higher priority waiter has simply not yet managed to take over
> the rtmutex.
>
> The BUG_ON() is therefore wrong and this is just another retry condition in
> fixup_pi_state_owner().
>
> Drop the locks, so that T3 can make progress, and then try the fixup again.
>
> Gratian provided a great analysis, traces and a reproducer. The analysis is
> to the point, but it confused the hell out of that tglx dude who had to
> page in all the futex horrors again. Condensed version is above. 
>
> [ tglx: Wrote comment and changelog ]
>
> Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
> Reported-by: Gratian Crisan <gratian.crisan@ni.com>
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: stable@vger.kernel.org
> Link: https://urldefense.com/v3/__https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com__;!!FbZ0ZwI3Qg!5INAsNbAVSp3jaNkkjFazSRC86BpcZnVM3-oTDYl0KijU6jA5pWYk4KI79_L5F4$ 

LGTM, no crashes in my testing today.

-Gratian

> ---
>  kernel/futex.c |   16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2380,10 +2380,22 @@ static int fixup_pi_state_owner(u32 __us
>  		}
>  
>  		/*
> -		 * Since we just failed the trylock; there must be an owner.
> +		 * The trylock just failed, so either there is an owner or
> +		 * there is a higher priority waiter than this one.
>  		 */
>  		newowner = rt_mutex_owner(&pi_state->pi_mutex);
> -		BUG_ON(!newowner);
> +		/*
> +		 * If the higher priority waiter has not yet taken over the
> +		 * rtmutex then newowner is NULL. We can't return here with
> +		 * that state because it's inconsistent vs. the user space
> +		 * state. So drop the locks and try again. It's a valid
> +		 * situation and not any different from the other retry
> +		 * conditions.
> +		 */
> +		if (unlikely(!newowner)) {
> +			ret = -EAGAIN;
> +			goto handle_err;
> +		}
>  	} else {
>  		WARN_ON_ONCE(argowner != current);
>  		if (oldowner == current) {


  parent reply	other threads:[~2020-11-05  6:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-28 23:24 BUG_ON(!newowner) in fixup_pi_state_owner() Gratian Crisan
2020-11-03 23:31 ` Gratian Crisan
2020-11-04  0:56   ` Mike Galbraith
2020-11-04  7:42     ` Mike Galbraith
2020-11-04 10:24       ` Thomas Gleixner
2020-11-04 11:17         ` Thomas Gleixner
2020-11-04 13:26           ` Thomas Gleixner
2020-11-04 13:43             ` Mike Galbraith
2020-11-04 15:12               ` [PATCH] futex: Handle transient "ownerless" rtmutex state correctly Thomas Gleixner
2020-11-04 15:22                 ` Thomas Gleixner
2020-11-04 16:50                 ` Mike Galbraith
2020-11-05  6:32                 ` Gratian Crisan [this message]
2020-11-04 10:00   ` BUG_ON(!newowner) in fixup_pi_state_owner() Thomas Gleixner
2020-11-06 21:26 ` [tip: locking/urgent] futex: Handle transient "ownerless" rtmutex state correctly tip-bot2 for Mike Galbraith
2020-11-07 17:05   ` Mike Galbraith
2020-11-07 21:07     ` Thomas Gleixner
2020-11-07 21:08 ` tip-bot2 for Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877dr0xqiq.fsf@ni.com \
    --to=gratian.crisan@ni.com \
    --cc=bigeasy@linutronix.de \
    --cc=brandon.streiff@ni.com \
    --cc=dvhart@infradead.org \
    --cc=efault@gmx.de \
    --cc=james.minor@ni.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.