All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug report] locking/rtmutex: Return success on deadlock for ww_mutex waiters
@ 2021-08-31  8:21 Dan Carpenter
  2021-09-01  8:09 ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Dan Carpenter @ 2021-08-31  8:21 UTC (permalink / raw)
  To: peterz; +Cc: kernel-janitors

Hello Peter Zijlstra,

This is a semi-automatic email about new static checker warnings.

The patch a055fcc132d4: "locking/rtmutex: Return success on deadlock
for ww_mutex waiters" from Aug 26, 2021, leads to the following
Smatch complaint:

    kernel/locking/rtmutex.c:756 rt_mutex_adjust_prio_chain()
    error: we previously assumed 'orig_waiter' could be null (see line 644)

kernel/locking/rtmutex.c
   643		 */
   644		if (orig_waiter && !rt_mutex_owner(orig_lock))
                    ^^^^^^^^^^^
A lot of this code assumes "orig_waiter" can be NULL.

   645			goto out_unlock_pi;
   646	
   647		/*
   648		 * We dropped all locks after taking a refcount on @task, so
   649		 * the task might have moved on in the lock chain or even left
   650		 * the chain completely and blocks now on an unrelated lock or
   651		 * on @orig_lock.
   652		 *
   653		 * We stored the lock on which @task was blocked in @next_lock,
   654		 * so we can detect the chain change.
   655		 */
   656		if (next_lock != waiter->lock)
   657			goto out_unlock_pi;
   658	
   659		/*
   660		 * There could be 'spurious' loops in the lock graph due to ww_mutex,
   661		 * consider:
   662		 *
   663		 *   P1: A, ww_A, ww_B
   664		 *   P2: ww_B, ww_A
   665		 *   P3: A
   666		 *
   667		 * P3 should not return -EDEADLK because it gets trapped in the cycle
   668		 * created by P1 and P2 (which will resolve -- and runs into
   669		 * max_lock_depth above). Therefore disable detect_deadlock such that
   670		 * the below termination condition can trigger once all relevant tasks
   671		 * are boosted.
   672		 *
   673		 * Even when we start with ww_mutex we can disable deadlock detection,
   674		 * since we would supress a ww_mutex induced deadlock at [6] anyway.
   675		 * Supressing it here however is not sufficient since we might still
   676		 * hit [6] due to adjustment driven iteration.
   677		 *
   678		 * NOTE: if someone were to create a deadlock between 2 ww_classes we'd
   679		 * utterly fail to report it; lockdep should.
   680		 */
   681		if (IS_ENABLED(CONFIG_PREEMPT_RT) && waiter->ww_ctx && detect_deadlock)
   682			detect_deadlock = false;
   683	
   684		/*
   685		 * Drop out, when the task has no waiters. Note,
   686		 * top_waiter can be NULL, when we are in the deboosting
   687		 * mode!
   688		 */
   689		if (top_waiter) {
   690			if (!task_has_pi_waiters(task))
   691				goto out_unlock_pi;
   692			/*
   693			 * If deadlock detection is off, we stop here if we
   694			 * are not the top pi waiter of the task. If deadlock
   695			 * detection is enabled we continue, but stop the
   696			 * requeueing in the chain walk.
   697			 */
   698			if (top_waiter != task_top_pi_waiter(task)) {
   699				if (!detect_deadlock)
   700					goto out_unlock_pi;
   701				else
   702					requeue = false;
   703			}
   704		}
   705	
   706		/*
   707		 * If the waiter priority is the same as the task priority
   708		 * then there is no further priority adjustment necessary.  If
   709		 * deadlock detection is off, we stop the chain walk. If its
   710		 * enabled we continue, but stop the requeueing in the chain
   711		 * walk.
   712		 */
   713		if (rt_mutex_waiter_equal(waiter, task_to_waiter(task))) {
   714			if (!detect_deadlock)
   715				goto out_unlock_pi;
   716			else
   717				requeue = false;
   718		}
   719	
   720		/*
   721		 * [4] Get the next lock
   722		 */
   723		lock = waiter->lock;
   724		/*
   725		 * [5] We need to trylock here as we are holding task->pi_lock,
   726		 * which is the reverse lock order versus the other rtmutex
   727		 * operations.
   728		 */
   729		if (!raw_spin_trylock(&lock->wait_lock)) {
   730			raw_spin_unlock_irq(&task->pi_lock);
   731			cpu_relax();
   732			goto retry;
   733		}
   734	
   735		/*
   736		 * [6] check_exit_conditions_2() protected by task->pi_lock and
   737		 * lock->wait_lock.
   738		 *
   739		 * Deadlock detection. If the lock is the same as the original
   740		 * lock which caused us to walk the lock chain or if the
   741		 * current lock is owned by the task which initiated the chain
   742		 * walk, we detected a deadlock.
   743		 */
   744		if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
                    ^^^^^^^^^^^^^^^^^
This might mean it's a false positive, but Smatch isn't clever enough to
figure it out.  And I'm stupid too!  Plus lazy...  and ugly.

   745			ret = -EDEADLK;
   746	
   747			/*
   748			 * When the deadlock is due to ww_mutex; also see above. Don't
   749			 * report the deadlock and instead let the ww_mutex wound/die
   750			 * logic pick which of the contending threads gets -EDEADLK.
   751			 *
   752			 * NOTE: assumes the cycle only contains a single ww_class; any
   753			 * other configuration and we fail to report; also, see
   754			 * lockdep.
   755			 */
   756			if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter->ww_ctx)
                                                             ^^^^^^^^^^^^^^^^^^^
Unchecked dereference.

   757				ret = 0;
   758	

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] locking/rtmutex: Return success on deadlock for ww_mutex waiters
  2021-08-31  8:21 [bug report] locking/rtmutex: Return success on deadlock for ww_mutex waiters Dan Carpenter
@ 2021-09-01  8:09 ` Peter Zijlstra
  2021-09-01  9:44   ` [PATCH] locking/rtmutex: Fix ww_mutex deadlock check Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2021-09-01  8:09 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: kernel-janitors, Thomas Gleixner, linux-kernel,
	Sebastian Andrzej Siewior

On Tue, Aug 31, 2021 at 11:21:52AM +0300, Dan Carpenter wrote:
> Hello Peter Zijlstra,

Hi Dan :-)

> This is a semi-automatic email about new static checker warnings.
> 
> The patch a055fcc132d4: "locking/rtmutex: Return success on deadlock
> for ww_mutex waiters" from Aug 26, 2021, leads to the following
> Smatch complaint:
> 
>     kernel/locking/rtmutex.c:756 rt_mutex_adjust_prio_chain()
>     error: we previously assumed 'orig_waiter' could be null (see line 644)
> 
> kernel/locking/rtmutex.c
>    643		 */
>    644		if (orig_waiter && !rt_mutex_owner(orig_lock))
>                     ^^^^^^^^^^^
> A lot of this code assumes "orig_waiter" can be NULL.
> 

>    735		/*
>    736		 * [6] check_exit_conditions_2() protected by task->pi_lock and
>    737		 * lock->wait_lock.
>    738		 *
>    739		 * Deadlock detection. If the lock is the same as the original
>    740		 * lock which caused us to walk the lock chain or if the
>    741		 * current lock is owned by the task which initiated the chain
>    742		 * walk, we detected a deadlock.
>    743		 */
>    744		if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
>                     ^^^^^^^^^^^^^^^^^
> This might mean it's a false positive, but Smatch isn't clever enough to
> figure it out.  And I'm stupid too!  Plus lazy...  and ugly.
> 
>    745			ret = -EDEADLK;
>    746	
>    747			/*
>    748			 * When the deadlock is due to ww_mutex; also see above. Don't
>    749			 * report the deadlock and instead let the ww_mutex wound/die
>    750			 * logic pick which of the contending threads gets -EDEADLK.
>    751			 *
>    752			 * NOTE: assumes the cycle only contains a single ww_class; any
>    753			 * other configuration and we fail to report; also, see
>    754			 * lockdep.
>    755			 */
>    756			if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter->ww_ctx)
>                                                              ^^^^^^^^^^^^^^^^^^^
> Unchecked dereference.


This is difficult... and I'm glad you flagged it. The normal de-boost
path is through rt_mutex_adjust_prio() and that has: .orig_lock == NULL
&& .orig_waiter == NULL. And as such it would never trigger the above
case.

However, there is remove_waiter() which is called on rt_mutex_lock()'s
failure paths and that doesn't have .orig_lock == NULL, and as such
*could* conceivably trigger this.

Let me figure out what the right thing to do is.

Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH] locking/rtmutex: Fix ww_mutex deadlock check
  2021-09-01  8:09 ` Peter Zijlstra
@ 2021-09-01  9:44   ` Peter Zijlstra
  2021-09-06 16:51     ` Sebastian Andrzej Siewior
  2021-09-09  8:34     ` [tip: locking/urgent] " tip-bot2 for Peter Zijlstra
  0 siblings, 2 replies; 5+ messages in thread
From: Peter Zijlstra @ 2021-09-01  9:44 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: kernel-janitors, Thomas Gleixner, linux-kernel,
	Sebastian Andrzej Siewior

On Wed, Sep 01, 2021 at 10:09:43AM +0200, Peter Zijlstra wrote:
> On Tue, Aug 31, 2021 at 11:21:52AM +0300, Dan Carpenter wrote:
> > Hello Peter Zijlstra,
> 
> Hi Dan :-)
> 
> > This is a semi-automatic email about new static checker warnings.
> > 
> > The patch a055fcc132d4: "locking/rtmutex: Return success on deadlock
> > for ww_mutex waiters" from Aug 26, 2021, leads to the following
> > Smatch complaint:
> > 
> >     kernel/locking/rtmutex.c:756 rt_mutex_adjust_prio_chain()
> >     error: we previously assumed 'orig_waiter' could be null (see line 644)
> > 
> > kernel/locking/rtmutex.c
> >    643		 */
> >    644		if (orig_waiter && !rt_mutex_owner(orig_lock))
> >                     ^^^^^^^^^^^
> > A lot of this code assumes "orig_waiter" can be NULL.
> > 
> 
> >    735		/*
> >    736		 * [6] check_exit_conditions_2() protected by task->pi_lock and
> >    737		 * lock->wait_lock.
> >    738		 *
> >    739		 * Deadlock detection. If the lock is the same as the original
> >    740		 * lock which caused us to walk the lock chain or if the
> >    741		 * current lock is owned by the task which initiated the chain
> >    742		 * walk, we detected a deadlock.
> >    743		 */
> >    744		if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> >                     ^^^^^^^^^^^^^^^^^
> > This might mean it's a false positive, but Smatch isn't clever enough to
> > figure it out.  And I'm stupid too!  Plus lazy...  and ugly.
> > 
> >    745			ret = -EDEADLK;
> >    746	
> >    747			/*
> >    748			 * When the deadlock is due to ww_mutex; also see above. Don't
> >    749			 * report the deadlock and instead let the ww_mutex wound/die
> >    750			 * logic pick which of the contending threads gets -EDEADLK.
> >    751			 *
> >    752			 * NOTE: assumes the cycle only contains a single ww_class; any
> >    753			 * other configuration and we fail to report; also, see
> >    754			 * lockdep.
> >    755			 */
> >    756			if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter->ww_ctx)
> >                                                              ^^^^^^^^^^^^^^^^^^^
> > Unchecked dereference.
> 
> 
> This is difficult... and I'm glad you flagged it. The normal de-boost
> path is through rt_mutex_adjust_prio() and that has: .orig_lock == NULL
> && .orig_waiter == NULL. And as such it would never trigger the above
> case.
> 
> However, there is remove_waiter() which is called on rt_mutex_lock()'s
> failure paths and that doesn't have .orig_lock == NULL, and as such
> *could* conceivably trigger this.
> 
> Let me figure out what the right thing to do is.
> 
> Thanks!

I think something like this ought to do.

---
Subject: locking/rtmutex: Fix ww_mutex deadlock check

Dan reported that rt_mutex_adjust_prio_chain() can be called with
.orig_waiter == NULL however commit a055fcc132d4 ("locking/rtmutex:
Return success on deadlock for ww_mutex waiters") unconditionally
dereferences it.

Since both call-sites that have .orig_waiter == NULL don't care for the
return value, simply disable the deadlock squash by adding the NULL
check.

Notably, both callers use the deadlock condition as a termination
condition for the iteration; once detected, we're sure (de)boosting is
done. Arguably [3] would be a more natural termination point, but I'm
not sure adding a third deadlock detection state would improve the code.

Fixes: a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/locking/rtmutex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 8eabdc79602b..6bb116c559b4 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -753,7 +753,7 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
 		 * other configuration and we fail to report; also, see
 		 * lockdep.
 		 */
-		if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter->ww_ctx)
+		if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter && orig_waiter->ww_ctx)
 			ret = 0;
 
 		raw_spin_unlock(&lock->wait_lock);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] locking/rtmutex: Fix ww_mutex deadlock check
  2021-09-01  9:44   ` [PATCH] locking/rtmutex: Fix ww_mutex deadlock check Peter Zijlstra
@ 2021-09-06 16:51     ` Sebastian Andrzej Siewior
  2021-09-09  8:34     ` [tip: locking/urgent] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2021-09-06 16:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dan Carpenter, kernel-janitors, Thomas Gleixner, linux-kernel

On 2021-09-01 11:44:11 [+0200], Peter Zijlstra wrote:
> Subject: locking/rtmutex: Fix ww_mutex deadlock check
> 
> Dan reported that rt_mutex_adjust_prio_chain() can be called with
> .orig_waiter == NULL however commit a055fcc132d4 ("locking/rtmutex:
> Return success on deadlock for ww_mutex waiters") unconditionally
> dereferences it.
> 
> Since both call-sites that have .orig_waiter == NULL don't care for the
> return value, simply disable the deadlock squash by adding the NULL
> check.
> 
> Notably, both callers use the deadlock condition as a termination
> condition for the iteration; once detected, we're sure (de)boosting is
> done. Arguably [3] would be a more natural termination point, but I'm
> not sure adding a third deadlock detection state would improve the code.
> 
> Fixes: a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters")
> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

It sounds reasonable and I don't see any fallout in the testsuite so

  Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> ---
>  kernel/locking/rtmutex.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index 8eabdc79602b..6bb116c559b4 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -753,7 +753,7 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
>  		 * other configuration and we fail to report; also, see
>  		 * lockdep.
>  		 */
> -		if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter->ww_ctx)
> +		if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter && orig_waiter->ww_ctx)
>  			ret = 0;
>  
>  		raw_spin_unlock(&lock->wait_lock);

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: locking/urgent] locking/rtmutex: Fix ww_mutex deadlock check
  2021-09-01  9:44   ` [PATCH] locking/rtmutex: Fix ww_mutex deadlock check Peter Zijlstra
  2021-09-06 16:51     ` Sebastian Andrzej Siewior
@ 2021-09-09  8:34     ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-09-09  8:34 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Dan Carpenter, Peter Zijlstra (Intel),
	Thomas Gleixner, Sebastian Andrzej Siewior, x86, linux-kernel

The following commit has been merged into the locking/urgent branch of tip:

Commit-ID:     e5480572706da1b2c2dc2c6484eab64f92b9263b
Gitweb:        https://git.kernel.org/tip/e5480572706da1b2c2dc2c6484eab64f92b9263b
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 01 Sep 2021 11:44:11 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Thu, 09 Sep 2021 10:31:22 +02:00

locking/rtmutex: Fix ww_mutex deadlock check

Dan reported that rt_mutex_adjust_prio_chain() can be called with
.orig_waiter == NULL however commit a055fcc132d4 ("locking/rtmutex: Return
success on deadlock for ww_mutex waiters") unconditionally dereferences it.

Since both call-sites that have .orig_waiter == NULL don't care for the
return value, simply disable the deadlock squash by adding the NULL check.

Notably, both callers use the deadlock condition as a termination condition
for the iteration; once detected, it is sure that (de)boosting is done.
Arguably step [3] would be a more natural termination point, but it's
dubious whether adding a third deadlock detection state would improve the
code.

Fixes: a055fcc132d4 ("locking/rtmutex: Return success on deadlock for ww_mutex waiters")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/YS9La56fHMiCCo75@hirez.programming.kicks-ass.net

---
 kernel/locking/rtmutex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 8eabdc7..6bb116c 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -753,7 +753,7 @@ static int __sched rt_mutex_adjust_prio_chain(struct task_struct *task,
 		 * other configuration and we fail to report; also, see
 		 * lockdep.
 		 */
-		if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter->ww_ctx)
+		if (IS_ENABLED(CONFIG_PREEMPT_RT) && orig_waiter && orig_waiter->ww_ctx)
 			ret = 0;
 
 		raw_spin_unlock(&lock->wait_lock);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-09-09  8:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-31  8:21 [bug report] locking/rtmutex: Return success on deadlock for ww_mutex waiters Dan Carpenter
2021-09-01  8:09 ` Peter Zijlstra
2021-09-01  9:44   ` [PATCH] locking/rtmutex: Fix ww_mutex deadlock check Peter Zijlstra
2021-09-06 16:51     ` Sebastian Andrzej Siewior
2021-09-09  8:34     ` [tip: locking/urgent] " tip-bot2 for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.