All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] Faux deadlock detection fixup
@ 2014-05-23 14:30 Brad Mouring
  2014-05-23 14:30 ` [PATCH 1/1] rtmutex: Handle when top lock owner changes Brad Mouring
  0 siblings, 1 reply; 23+ messages in thread
From: Brad Mouring @ 2014-05-23 14:30 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Steven Rostedt, Brad Mouring

Greetings punctual folks (or at least folks who make linux punctual).

I recently ran into an issue that I've been spending a few
weeks trying to characterize, which includes trying my best to
comprehend glibc's pthread_mutex implementation that uses futexes
(which, in turn, use rtmutexes). I'll try my best to describe the
situation I saw before presenting the patch that I wrote to fix the
issue. By all means, if I get some part of this wrong or the approach
taken to resolve the issue is incorrect, please let me know. Without
futher ado, consider the following scenario for a priority chain:

All tasks involved here are SCHED_OTHER, so same priority. All user-
space mutexes are pthread_mutexes configured to be PI (useless if all
threads are SCHED_OTHER, I know, but we allow users to give rt
priorities to threads, just not in this instance) and recursive (this
doesn't seem to be involved here, as that keeps things in userspace.)

Thread (task) C attempts to take a pthread_mutex that's already
owned, resulting in a sys_futex call. The current priority chain is
as follows

task A owns lock L1
 task^           L1 blocks task D < top_waiter on L1
                 L1 blocks task B
             lock^         task B owns lock L2
                                            L2 blocks task C
                                                    current^

We walk the chain, at the point when we attempt to take task A's
pi_lock (and it is contended) or from the explicit attempt at L1's
wait_lock, either one being contended can result in C being scheduled
out. The top_waiter variable currently points at "L1 blocks task D".
The task variable currently points at A.

When C scheduled out, A is scheduled in. A releases L1, wins L3 (in
userspace, since it's uncontended), and turns around and blocks on L2.

Since A released L1, this frees D to take it. D is running the same
code path as A, so it does what it needs to with L1, releases it,
attempts to take L3 and blocks. Funny thing is, the waiter for this
dependency is at the same address pointed to by the top_waiter, that
previously expressed "L1 blocks D" when top_waiter was set when C
blocked on L2.

At this point, the chain resembles some facsimile of the of the
following:

task B owns L2
            L2 blocks task C
            L2 blocks task A
                      task A owns L3
                       task^      L3 blocks task D < top_waiter

Task C is scheduled back in after this shuffle, resumes walking the
prio_chain, *passes* the test looking at top_waiter and task's top
pi_waiter (top_waiter is tragically pointing at "L3 blocks D" now),
and the test to see if the lock that's blocking A is the original
lock that C originally blocked on. Hmm.

We report EDEADLK to userspace that, sanely, SIGABRTs our process.

This series of events seems outlandish, but I have traces that show
just this behavior (and similar notes) at 
http://www.the-bradlands.net/waitergate.tar.gz

Anyway, this patch simply checks to ensure that the current task is
indeed the owner of the current lock examined. A test in our 
(unfortunately complicated) execution framework (LabVIEW) would
reproduce the issue on an arm cortex-a9 within minutes and an x64
Atom within hours. After applying this patch, we were unable to 
reproduce the issue after running the test for days.

Originally, the check and fixup happened outside of the deadlock
detection block but we found the fixup code was being hit with
alarming regularity, considering we only wanted to address the false
deadlock case, and as such, I was convinced to move it inside the
deadlock detected block.

Brad Mouring (1):
  rtmutex: Handle when top lock owner changes

 kernel/locking/rtmutex.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

-- 
1.8.3-rc3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-05-23 14:30 [PATCH 0/1] Faux deadlock detection fixup Brad Mouring
@ 2014-05-23 14:30 ` Brad Mouring
  2014-06-04  1:06   ` Steven Rostedt
  0 siblings, 1 reply; 23+ messages in thread
From: Brad Mouring @ 2014-05-23 14:30 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Steven Rostedt, Brad Mouring

If, during walking the priority chain on a task blocking on a rtmutex,
and the task is examining the waiter blocked on the lock owned by a task
that is not blocking (the end of the chain), the current task is ejected
from the processor and the owner of the end lock is scheduled in,
releasing that lock, before the original task is scheduled back in, the
task misses the fact that the previous owner of the current lock no
longer holds it.

Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Acked-by: Scot Salmon <scot.salmon@ni.com>
Acked-by: Ben Shelton <ben.shelton@ni.com>
Tested-by: Jeff Westfahl <jeff.westfahl@ni.com>
---
 kernel/locking/rtmutex.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index fbf152b..029a9ab 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -384,6 +384,25 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
 
 	/* Deadlock detection */
 	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
+		/*
+		 * If the prio chain has changed out from under us, set the task
+		 * to the current owner of the lock in the current waiter and
+		 * continue walking the prio chain
+		 */
+		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {
+			/* Release the old owner */
+			raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+			put_task_struct(task);
+
+			/* Move to the new owner */
+			task = rt_mutex_owner(lock);
+			get_task_struct(task);
+
+			/* Let's try this again */
+			raw_spin_unlock(&lock->wait_lock);
+			goto retry;
+		}
+
 		debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
 		raw_spin_unlock(&lock->wait_lock);
 		ret = deadlock_detect ? -EDEADLK : 0;
-- 
1.8.3-rc3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-05-23 14:30 ` [PATCH 1/1] rtmutex: Handle when top lock owner changes Brad Mouring
@ 2014-06-04  1:06   ` Steven Rostedt
  2014-06-04 13:05     ` Brad Mouring
  2014-06-04 15:32     ` Thomas Gleixner
  0 siblings, 2 replies; 23+ messages in thread
From: Steven Rostedt @ 2014-06-04  1:06 UTC (permalink / raw)
  To: Brad Mouring
  Cc: linux-rt-users, Brad Mouring, Thomas Gleixner, LKML,
	Peter Zijlstra, Ingo Molnar, Clark Williams

Added LKML and Thomas et. al., as this looks to be mainline too, and
we've been having so much fun with futexes lately.

What I've thought about is this scenario. In
rt_mutex_adjust_prio_chain(), at the bottom of the loop, just before
goto again is called, all locks are released, and we are fully
preemptible. That means, at than moment, anything can happen.

The reason we are in this code is because we blocked on a lock and we
are pushing the inheritance up as well as checking for deadlocks. But,
there's a flaw here in the deadlock case. Lets say we have this.

Tasks A, B, C and D

Locks L1, L2, L3, L4.

D owns L4, C owns L3, B owns L2. C tries to take L4 and blocks, B tries
to take L3, blocks. We then have:

  L2->B->L3->C->L4->D

Perfectly fine. Then A comes along and blocks on L2, where we would
have:

  A->L2->B->L3->C->L4->D

Lets say that A on its chain walk just before that goto again, and task
is D. top_waiter is C. As all locks are released and preemption is
enabled, lots of things can happen. Lets say they all release their
locks! And now we just have:

  A->L2

but things are still running, and they take the locks such that we have:

  C->L1->D->L2->B
         A->L2

That is, B grabbed L2 (stole it from A), D grabbed L1, D blocked on L2
and C blocked on L1. Now A gets scheduled in and continues.

  task->pi_blocked_on

Yep, as task is D, and it's blocked on L1 that is true.

  orig_waiter && !rt_mutex_owner(orig_lock)

well, L1 has a owner, thus it wont exit due to this.

  top_waiter && (!task_has_pi_waiters(task) ||
		top_waiter != task_top_pi_waiter(task))

top_waiter is C, and D has pi_waiters, and C is still the top pi waiter
for D.

Then we get to the test.

  lock == orig_lock || rt_mutex_owner(lock) == top_task

lock happens to be L2 and this is the original lock we wanted to take.
This reports a deadlock, but no deadlock scenario ever occurred.

I'm not sure if Brad's patch addresses this, but when reviewing
possible scenarios, this came to my mind.

-- Steve



On Fri, 23 May 2014 09:30:10 -0500
"Brad Mouring" <bmouring@ni.com> wrote:

> If, during walking the priority chain on a task blocking on a rtmutex,
> and the task is examining the waiter blocked on the lock owned by a task
> that is not blocking (the end of the chain), the current task is ejected
> from the processor and the owner of the end lock is scheduled in,
> releasing that lock, before the original task is scheduled back in, the
> task misses the fact that the previous owner of the current lock no
> longer holds it.
> 
> Signed-off-by: Brad Mouring <brad.mouring@ni.com>
> Acked-by: Scot Salmon <scot.salmon@ni.com>
> Acked-by: Ben Shelton <ben.shelton@ni.com>
> Tested-by: Jeff Westfahl <jeff.westfahl@ni.com>
> ---
>  kernel/locking/rtmutex.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index fbf152b..029a9ab 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -384,6 +384,25 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
>  
>  	/* Deadlock detection */
>  	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> +		/*
> +		 * If the prio chain has changed out from under us, set the task
> +		 * to the current owner of the lock in the current waiter and
> +		 * continue walking the prio chain
> +		 */
> +		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {
> +			/* Release the old owner */
> +			raw_spin_unlock_irqrestore(&task->pi_lock, flags);
> +			put_task_struct(task);
> +
> +			/* Move to the new owner */
> +			task = rt_mutex_owner(lock);
> +			get_task_struct(task);
> +
> +			/* Let's try this again */
> +			raw_spin_unlock(&lock->wait_lock);
> +			goto retry;
> +		}
> +
>  		debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
>  		raw_spin_unlock(&lock->wait_lock);
>  		ret = deadlock_detect ? -EDEADLK : 0;


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04  1:06   ` Steven Rostedt
@ 2014-06-04 13:05     ` Brad Mouring
  2014-06-04 14:16       ` Steven Rostedt
  2014-06-04 15:32     ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Brad Mouring @ 2014-06-04 13:05 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: linux-rt-users, Thomas Gleixner, LKML, Peter Zijlstra,
	Ingo Molnar, Clark Williams

On Tue, Jun 03, 2014 at 09:06:09PM -0400, Steven Rostedt wrote:
> Added LKML and Thomas et. al., as this looks to be mainline too, and
> we've been having so much fun with futexes lately.
> 
> What I've thought about is this scenario. In
> rt_mutex_adjust_prio_chain(), at the bottom of the loop, just before
> goto again is called, all locks are released, and we are fully
> preemptible. That means, at than moment, anything can happen.
> 
> The reason we are in this code is because we blocked on a lock and we
> are pushing the inheritance up as well as checking for deadlocks. But,
> there's a flaw here in the deadlock case. Lets say we have this.
> 
> Tasks A, B, C and D
> 
> Locks L1, L2, L3, L4.
> 
> D owns L4, C owns L3, B owns L2. C tries to take L4 and blocks, B tries
> to take L3, blocks. We then have:
> 
>   L2->B->L3->C->L4->D
> 
> Perfectly fine. Then A comes along and blocks on L2, where we would
> have:
> 
>   A->L2->B->L3->C->L4->D
> 
> Lets say that A on its chain walk just before that goto again, and task
> is D. top_waiter is C. As all locks are released and preemption is
> enabled, lots of things can happen. Lets say they all release their
> locks! And now we just have:
> 
>   A->L2
> 
> but things are still running, and they take the locks such that we have:
> 
>   C->L1->D->L2->B
>          A->L2

This is a slight variation on what I was seeing. To use the nomenclature
that you proposed at the start, rewinding to the point

   A->L2->B->L3->C->L4->D

Let's assume things continue to unfold as you explain. Task is D,
top_waiter is C. A is scheduled out and the chain shuffles.

       A->L2->B
C->L4->D->'

So, we now have D blocking on L2 and L4 has waiters, C again. Also,
since the codepath to have C block on L4 again is the same as the
codepath from when it blocked on it in the first place, the location
is the same since the stack (for what we care about) is the same.

I can see both of these being different expressions of the same problem.

> 
> That is, B grabbed L2 (stole it from A), D grabbed L1, D blocked on L2
> and C blocked on L1. Now A gets scheduled in and continues.
> 
>   task->pi_blocked_on
> 
> Yep, as task is D, and it's blocked on L1 that is true.
> 
>   orig_waiter && !rt_mutex_owner(orig_lock)
> 
> well, L1 has a owner, thus it wont exit due to this.
> 
>   top_waiter && (!task_has_pi_waiters(task) ||
> 		top_waiter != task_top_pi_waiter(task))
> 
> top_waiter is C, and D has pi_waiters, and C is still the top pi waiter
> for D.
> 
> Then we get to the test.
> 
>   lock == orig_lock || rt_mutex_owner(lock) == top_task
> 
> lock happens to be L2 and this is the original lock we wanted to take.
> This reports a deadlock, but no deadlock scenario ever occurred.
> 
> I'm not sure if Brad's patch addresses this, but when reviewing
> possible scenarios, this came to my mind.

This is what my patch addresses.

> 
> -- Steve
> 
> 
> 
> On Fri, 23 May 2014 09:30:10 -0500
> "Brad Mouring" <bmouring@ni.com> wrote:
> 
> > If, during walking the priority chain on a task blocking on a rtmutex,
> > and the task is examining the waiter blocked on the lock owned by a task
> > that is not blocking (the end of the chain), the current task is ejected
> > from the processor and the owner of the end lock is scheduled in,
> > releasing that lock, before the original task is scheduled back in, the
> > task misses the fact that the previous owner of the current lock no
> > longer holds it.
> > 
> > Signed-off-by: Brad Mouring <brad.mouring@ni.com>
> > Acked-by: Scot Salmon <scot.salmon@ni.com>
> > Acked-by: Ben Shelton <ben.shelton@ni.com>
> > Tested-by: Jeff Westfahl <jeff.westfahl@ni.com>
> > ---
> >  kernel/locking/rtmutex.c | 19 +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> > 
> > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> > index fbf152b..029a9ab 100644
> > --- a/kernel/locking/rtmutex.c
> > +++ b/kernel/locking/rtmutex.c
> > @@ -384,6 +384,25 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
> >  
> >  	/* Deadlock detection */
> >  	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > +		/*
> > +		 * If the prio chain has changed out from under us, set the task
> > +		 * to the current owner of the lock in the current waiter and
> > +		 * continue walking the prio chain
> > +		 */
> > +		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {
> > +			/* Release the old owner */
> > +			raw_spin_unlock_irqrestore(&task->pi_lock, flags);
> > +			put_task_struct(task);
> > +
> > +			/* Move to the new owner */
> > +			task = rt_mutex_owner(lock);
> > +			get_task_struct(task);
> > +
> > +			/* Let's try this again */
> > +			raw_spin_unlock(&lock->wait_lock);
> > +			goto retry;
> > +		}
> > +
> >  		debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
> >  		raw_spin_unlock(&lock->wait_lock);
> >  		ret = deadlock_detect ? -EDEADLK : 0;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 13:05     ` Brad Mouring
@ 2014-06-04 14:16       ` Steven Rostedt
  2014-06-04 14:38         ` Brad Mouring
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2014-06-04 14:16 UTC (permalink / raw)
  To: Brad Mouring
  Cc: linux-rt-users, Thomas Gleixner, LKML, Peter Zijlstra,
	Ingo Molnar, Clark Williams

On Wed, 4 Jun 2014 08:05:25 -0500
"Brad Mouring" <bmouring@ni.com> wrote:

 >          A->L2
> 
> This is a slight variation on what I was seeing. To use the nomenclature
> that you proposed at the start, rewinding to the point
> 
>    A->L2->B->L3->C->L4->D
> 
> Let's assume things continue to unfold as you explain. Task is D,
> top_waiter is C. A is scheduled out and the chain shuffles.
> 
>        A->L2->B
> C->L4->D->'

But isn't that a lock ordering problem there?

If B can block on L3 owned by C, I see the following:

  B->L3->C->L4->D->L2->B

Deadlock!

In my scenario I was very careful to point out that the lock ordering
was: L1->L2->L3->L4

But you show that we can have both:

   L2-> ... ->L4

    and

   L4-> ... ->L2

Which is a reverse of lock ordering and a possible deadlock can occur.

-- Steve


> 
> So, we now have D blocking on L2 and L4 has waiters, C again. Also,
> since the codepath to have C block on L4 again is the same as the
> codepath from when it blocked on it in the first place, the location
> is the same since the stack (for what we care about) is the same.
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 14:16       ` Steven Rostedt
@ 2014-06-04 14:38         ` Brad Mouring
  2014-06-04 14:58           ` Steven Rostedt
  0 siblings, 1 reply; 23+ messages in thread
From: Brad Mouring @ 2014-06-04 14:38 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Brad Mouring, linux-rt-users, Thomas Gleixner, LKML,
	Peter Zijlstra, Ingo Molnar, Clark Williams

On Wed, Jun 04, 2014 at 10:16:12AM -0400, Steven Rostedt wrote:
> On Wed, 4 Jun 2014 08:05:25 -0500
> "Brad Mouring" <bmouring@ni.com> wrote:
> 
>  >          A->L2
> > 
> > This is a slight variation on what I was seeing. To use the nomenclature
> > that you proposed at the start, rewinding to the point
> > 
> >    A->L2->B->L3->C->L4->D
> > 
> > Let's assume things continue to unfold as you explain. Task is D,
> > top_waiter is C. A is scheduled out and the chain shuffles.
> > 
> >        A->L2->B
> > C->L4->D->'
> 
> But isn't that a lock ordering problem there?
> 
> If B can block on L3 owned by C, I see the following:
> 
>   B->L3->C->L4->D->L2->B
> 
> Deadlock!
Yes, it could be. But currently no one owns L3. B is currently not
blocked. Under these circumstances, there is no deadlock. Also, I
somewhat arbitrarily picked L4, it could be Lfoo that C blocks on
since the process is
...
waiter = D->pi_blocked_on

// waiter is real_waiter D->L2

// orig_waiter still there, orig_lock still has an owner

// top_waiter was pointing to C->L4, now points to C->Lfoo
// D does have top_waiters, and, as noted above, it aliased
// to encompass a different waiter scenario

> 
> In my scenario I was very careful to point out that the lock ordering
> was: L1->L2->L3->L4
> 
> But you show that we can have both:
> 
>    L2-> ... ->L4
> 
>     and
> 
>    L4-> ... ->L2
> 
> Which is a reverse of lock ordering and a possible deadlock can occur.

So the numbering/ordering of the locks is really somewhat arbitrary.
Here we *can* have L2-> ... ->L4 (if B decides to block on L2, it
could just as easily block on L8), and we absolutely have
L4-> ... ->L2. A deadlock *could* occur, but all of the traces that
I dug through, no actual deadlocks occurred.
> 
> -- Steve
> 
> 
> > 
> > So, we now have D blocking on L2 and L4 has waiters, C again. Also,
> > since the codepath to have C block on L4 again is the same as the
> > codepath from when it blocked on it in the first place, the location
> > is the same since the stack (for what we care about) is the same.
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 14:38         ` Brad Mouring
@ 2014-06-04 14:58           ` Steven Rostedt
  2014-06-04 15:11             ` Brad Mouring
  0 siblings, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2014-06-04 14:58 UTC (permalink / raw)
  To: Brad Mouring
  Cc: linux-rt-users, Thomas Gleixner, LKML, Peter Zijlstra,
	Ingo Molnar, Clark Williams

On Wed, 4 Jun 2014 09:38:30 -0500
"Brad Mouring" <bmouring@ni.com> wrote:

> On Wed, Jun 04, 2014 at 10:16:12AM -0400, Steven Rostedt wrote:
> > On Wed, 4 Jun 2014 08:05:25 -0500
> > "Brad Mouring" <bmouring@ni.com> wrote:
> > 
> >  >          A->L2
> > > 
> > > This is a slight variation on what I was seeing. To use the nomenclature
> > > that you proposed at the start, rewinding to the point
> > > 
> > >    A->L2->B->L3->C->L4->D
> > > 
> > > Let's assume things continue to unfold as you explain. Task is D,
> > > top_waiter is C. A is scheduled out and the chain shuffles.
> > > 
> > >        A->L2->B
> > > C->L4->D->'
> > 
> > But isn't that a lock ordering problem there?
> > 
> > If B can block on L3 owned by C, I see the following:
> > 
> >   B->L3->C->L4->D->L2->B
> > 
> > Deadlock!
> Yes, it could be. But currently no one owns L3. B is currently not
> blocked. Under these circumstances, there is no deadlock. Also, I
> somewhat arbitrarily picked L4, it could be Lfoo that C blocks on
> since the process is

OK, then you should have used L1, which basically makes it exactly my
scenario ;-)

> ...
> waiter = D->pi_blocked_on
> 
> // waiter is real_waiter D->L2
> 
> // orig_waiter still there, orig_lock still has an owner
> 
> // top_waiter was pointing to C->L4, now points to C->Lfoo
> // D does have top_waiters, and, as noted above, it aliased
> // to encompass a different waiter scenario
> 
> > 
> > In my scenario I was very careful to point out that the lock ordering
> > was: L1->L2->L3->L4
> > 
> > But you show that we can have both:
> > 
> >    L2-> ... ->L4
> > 
> >     and
> > 
> >    L4-> ... ->L2
> > 
> > Which is a reverse of lock ordering and a possible deadlock can occur.
> 
> So the numbering/ordering of the locks is really somewhat arbitrary.
> Here we *can* have L2-> ... ->L4 (if B decides to block on L2, it
> could just as easily block on L8), and we absolutely have
> L4-> ... ->L2. A deadlock *could* occur, but all of the traces that
> I dug through, no actual deadlocks occurred.

Heh, but that shows the code is broken. I'm not saying that our
deadlock detector is not returning false positives, I'm just stating
that you probably need to fix your code.

Yes, you can have a locking order of L1 -> L2 and also L2 -> L1, and if
you are lucky, that may never trigger any deadlocks. But why do you
think the kernel folks have put so much effort into lockdep. Lockdep
doesn't tell you that there is a deadlock (although it could), what it
is so useful with is to tell us where there are possible deadlocks.

If your code does take L1 -> L2 and then L2 -> L1, you have a chance of
hitting a deadlock right there. If you were to run the userspace
lockdep, it would spit out a nice warning for you.

But this is off topic, as I have shown that there exists an example
that the userspace code would never deadlock but our deadlock detector
would say it did.

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 14:58           ` Steven Rostedt
@ 2014-06-04 15:11             ` Brad Mouring
  0 siblings, 0 replies; 23+ messages in thread
From: Brad Mouring @ 2014-06-04 15:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Brad Mouring, linux-rt-users, Thomas Gleixner, LKML,
	Peter Zijlstra, Ingo Molnar, Clark Williams

On Wed, Jun 04, 2014 at 10:58:23AM -0400, Steven Rostedt wrote:
> On Wed, 4 Jun 2014 09:38:30 -0500
> "Brad Mouring" <bmouring@ni.com> wrote:
> 
> > On Wed, Jun 04, 2014 at 10:16:12AM -0400, Steven Rostedt wrote:
> > > On Wed, 4 Jun 2014 08:05:25 -0500
> > > "Brad Mouring" <bmouring@ni.com> wrote:
> > > 
> > >  >          A->L2
> > > > 
> > > > This is a slight variation on what I was seeing. To use the nomenclature
> > > > that you proposed at the start, rewinding to the point
> > > > 
> > > >    A->L2->B->L3->C->L4->D
> > > > 
> > > > Let's assume things continue to unfold as you explain. Task is D,
> > > > top_waiter is C. A is scheduled out and the chain shuffles.
> > > > 
> > > >        A->L2->B
> > > > C->L4->D->'
> > > 
> > > But isn't that a lock ordering problem there?
> > > 
> > > If B can block on L3 owned by C, I see the following:
> > > 
> > >   B->L3->C->L4->D->L2->B
> > > 
> > > Deadlock!
> > Yes, it could be. But currently no one owns L3. B is currently not
> > blocked. Under these circumstances, there is no deadlock. Also, I
> > somewhat arbitrarily picked L4, it could be Lfoo that C blocks on
> > since the process is
> 
> OK, then you should have used L1, which basically makes it exactly my
> scenario ;-)

Heh, fair point.

> 
> > ...
> > waiter = D->pi_blocked_on
> > 
> > // waiter is real_waiter D->L2
> > 
> > // orig_waiter still there, orig_lock still has an owner
> > 
> > // top_waiter was pointing to C->L4, now points to C->Lfoo
> > // D does have top_waiters, and, as noted above, it aliased
> > // to encompass a different waiter scenario
> > 
> > > 
> > > In my scenario I was very careful to point out that the lock ordering
> > > was: L1->L2->L3->L4
> > > 
> > > But you show that we can have both:
> > > 
> > >    L2-> ... ->L4
> > > 
> > >     and
> > > 
> > >    L4-> ... ->L2
> > > 
> > > Which is a reverse of lock ordering and a possible deadlock can occur.
> > 
> > So the numbering/ordering of the locks is really somewhat arbitrary.
> > Here we *can* have L2-> ... ->L4 (if B decides to block on L2, it
> > could just as easily block on L8), and we absolutely have
> > L4-> ... ->L2. A deadlock *could* occur, but all of the traces that
> > I dug through, no actual deadlocks occurred.
> 
> Heh, but that shows the code is broken. I'm not saying that our
> deadlock detector is not returning false positives, I'm just stating
> that you probably need to fix your code.
> 
> Yes, you can have a locking order of L1 -> L2 and also L2 -> L1, and if
> you are lucky, that may never trigger any deadlocks. But why do you
> think the kernel folks have put so much effort into lockdep. Lockdep
> doesn't tell you that there is a deadlock (although it could), what it
> is so useful with is to tell us where there are possible deadlocks.
> 
> If your code does take L1 -> L2 and then L2 -> L1, you have a chance of
> hitting a deadlock right there. If you were to run the userspace
> lockdep, it would spit out a nice warning for you.

What I was saying is that the code can take L1 -> L2 and L2 -> Lfoo.
And, in fact, a quick glance back over my notes supports just this
behavior. It was unfortunate that I decided to come up with an example
without thinking it through first.
>
> But this is off topic, as I have shown that there exists an example
> that the userspace code would never deadlock but our deadlock detector
> would say it did.
> 
> -- Steve
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04  1:06   ` Steven Rostedt
  2014-06-04 13:05     ` Brad Mouring
@ 2014-06-04 15:32     ` Thomas Gleixner
  2014-06-04 15:44       ` Steven Rostedt
  2014-06-06  3:19       ` [PATCH 1/1] " Steven Rostedt
  1 sibling, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2014-06-04 15:32 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Brad Mouring, linux-rt-users, LKML, Peter Zijlstra, Ingo Molnar,
	Clark Williams

On Tue, 3 Jun 2014, Steven Rostedt wrote:
> On Fri, 23 May 2014 09:30:10 -0500
> "Brad Mouring" <bmouring@ni.com> wrote:
> >  	/* Deadlock detection */
> >  	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > +		/*
> > +		 * If the prio chain has changed out from under us, set the task
> > +		 * to the current owner of the lock in the current waiter and
> > +		 * continue walking the prio chain
> > +		 */
> > +		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {

No, sorry. That's wrong.

Your change wreckages the rt_mutex_owner(lock) == top_task test
simply because in that case:

   (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task)

evaluates to true.

Aside of that we need to figure out whether the lock chain changed
while we dropped the locks even in the non dead lock case. Otherwise
we follow up the wrong chain there.

T0 blocked on L1 held by T1
T1 blocked on L2 held by T2
T2 blocked on L3 held by T3

So we walk the chain and do:

   T1 -> L2 -> T2

Now here we get preempted.

T3 releases L3
T2 gets L3
T2 drops L3 and L2
T2 blocks on L4 held by T4
T4 blocked on L5 held by T5

So we happily boost T4 and T5. Not what we really want to do.

Nasty, isn't it ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 15:32     ` Thomas Gleixner
@ 2014-06-04 15:44       ` Steven Rostedt
  2014-06-04 18:02         ` Thomas Gleixner
  2014-06-06  3:19       ` [PATCH 1/1] " Steven Rostedt
  1 sibling, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2014-06-04 15:44 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Brad Mouring, linux-rt-users, LKML, Peter Zijlstra, Ingo Molnar,
	Clark Williams

On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> T3 releases L3
> T2 gets L3
> T2 drops L3 and L2
> T2 blocks on L4 held by T4
> T4 blocked on L5 held by T5
> 
> So we happily boost T4 and T5. Not what we really want to do.
> 
> Nasty, isn't it ?
> 

Actually, we may go up a chain, but we never do any unnecessary
boosting. That's because the boost is done with rt_mutex_adjust_prio()
which gets the prio from rt_mutex_getprio() which reads the
task->normal_prio and compares it to the task_top_pi_waiter(task)->prio,
which will always be correct as we have the necessary locks.

And we don't even need to worry about the chain we miss. That is, if
task A is blocked on a lock owned by D at the time, but as we go up the
chain, D releases the lock and B grabs it, B will still up its priority
based on the waiters of the lock (that is A), and if B blocks, it will
boost the tasks that own the lock it blocks on, where B is still
influenced by A.

The fact that we only update the prio based on the actual waiters and
don't carry a prio up the chain (which you designed, and I thought was
quite ingenious by the way), we may waste time going up a chain, but
the priority inheritance is still accurate.

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 15:44       ` Steven Rostedt
@ 2014-06-04 18:02         ` Thomas Gleixner
  2014-06-04 18:12           ` Steven Rostedt
  2014-06-04 19:25           ` Brad Mouring
  0 siblings, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2014-06-04 18:02 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Brad Mouring, linux-rt-users, LKML, Peter Zijlstra, Ingo Molnar,
	Clark Williams

On Wed, 4 Jun 2014, Steven Rostedt wrote:

> On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > T3 releases L3
> > T2 gets L3
> > T2 drops L3 and L2
> > T2 blocks on L4 held by T4
> > T4 blocked on L5 held by T5
> > 
> > So we happily boost T4 and T5. Not what we really want to do.
> > 
> > Nasty, isn't it ?
> > 
> 
> Actually, we may go up a chain, but we never do any unnecessary
> boosting. That's because the boost is done with rt_mutex_adjust_prio()
> which gets the prio from rt_mutex_getprio() which reads the
> task->normal_prio and compares it to the task_top_pi_waiter(task)->prio,
> which will always be correct as we have the necessary locks.

Indeed.
 
> And we don't even need to worry about the chain we miss. That is, if
> task A is blocked on a lock owned by D at the time, but as we go up the
> chain, D releases the lock and B grabs it, B will still up its priority
> based on the waiters of the lock (that is A), and if B blocks, it will
> boost the tasks that own the lock it blocks on, where B is still
> influenced by A.
> 
> The fact that we only update the prio based on the actual waiters and
> don't carry a prio up the chain (which you designed, and I thought was
> quite ingenious by the way), we may waste time going up a chain, but
> the priority inheritance is still accurate.

Duh. I actually had to lookup my notes from back then. There is even a
lenghty IRC discussion about not propagating the least waiters prio,
but lookup the actual lock waiters. Good, so we just walk for nothing
and waste some cpu cycles.

My brain still suffers from 3 days staring into futex.c

I'll fixup the check so it wont break the real deadlock case and queue
it.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 18:02         ` Thomas Gleixner
@ 2014-06-04 18:12           ` Steven Rostedt
  2014-06-04 20:49             ` Thomas Gleixner
  2014-06-04 19:25           ` Brad Mouring
  1 sibling, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2014-06-04 18:12 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Brad Mouring, linux-rt-users, LKML, Peter Zijlstra, Ingo Molnar,
	Clark Williams

On Wed, 4 Jun 2014 20:02:16 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

 
> My brain still suffers from 3 days staring into futex.c

Hopefully that's not permanent damage.

> 
> I'll fixup the check so it wont break the real deadlock case and queue
> it.
> 

Cool. Could you include my write up on the locking scenario in the
change log so that it is documented somewhere other than mail archives.

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 18:02         ` Thomas Gleixner
  2014-06-04 18:12           ` Steven Rostedt
@ 2014-06-04 19:25           ` Brad Mouring
  2014-06-04 19:53             ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Brad Mouring @ 2014-06-04 19:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Steven Rostedt, Brad Mouring, linux-rt-users, LKML,
	Peter Zijlstra, Ingo Molnar, Clark Williams

On Wed, Jun 04, 2014 at 08:02:16PM +0200, Thomas Gleixner wrote:
> On Wed, 4 Jun 2014, Steven Rostedt wrote:
> 
> > On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> > 
> > > T3 releases L3
> > > T2 gets L3
> > > T2 drops L3 and L2
> > > T2 blocks on L4 held by T4
> > > T4 blocked on L5 held by T5
> > > 
> > > So we happily boost T4 and T5. Not what we really want to do.
> > > 
> > > Nasty, isn't it ?
> > > 
> > 
> > Actually, we may go up a chain, but we never do any unnecessary
> > boosting. That's because the boost is done with rt_mutex_adjust_prio()
> > which gets the prio from rt_mutex_getprio() which reads the
> > task->normal_prio and compares it to the task_top_pi_waiter(task)->prio,
> > which will always be correct as we have the necessary locks.
> 
> Indeed.

I had thought through how to try to determine, from what we knew at
this point, whether or not we were walking a different chain for just
this concern, but I had convinced myself that it would be cumbersome,
error-prone, and, as pointed out here, not vital.

>  
> > And we don't even need to worry about the chain we miss. That is, if
> > task A is blocked on a lock owned by D at the time, but as we go up the
> > chain, D releases the lock and B grabs it, B will still up its priority
> > based on the waiters of the lock (that is A), and if B blocks, it will
> > boost the tasks that own the lock it blocks on, where B is still
> > influenced by A.
> > 
> > The fact that we only update the prio based on the actual waiters and
> > don't carry a prio up the chain (which you designed, and I thought was
> > quite ingenious by the way), we may waste time going up a chain, but
> > the priority inheritance is still accurate.
> 
> Duh. I actually had to lookup my notes from back then. There is even a
> lenghty IRC discussion about not propagating the least waiters prio,
> but lookup the actual lock waiters. Good, so we just walk for nothing
> and waste some cpu cycles.
> 
I put the check within the deadlock detection block to try to minimize
this case. As evidenced by the spinning on this, it's a rare case where
your userspace code has to be doing some wacky (but valid) stuff.

> My brain still suffers from 3 days staring into futex.c
> 
> I'll fixup the check so it wont break the real deadlock case and queue
> it.

How would the change break the real deadlock case?

> 
> Thanks,
> 
> 	tglx
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 19:25           ` Brad Mouring
@ 2014-06-04 19:53             ` Thomas Gleixner
  2014-06-04 20:07               ` Brad Mouring
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2014-06-04 19:53 UTC (permalink / raw)
  To: Brad Mouring
  Cc: Steven Rostedt, linux-rt-users, LKML, Peter Zijlstra,
	Ingo Molnar, Clark Williams

On Wed, 4 Jun 2014, Brad Mouring wrote:
> On Wed, Jun 04, 2014 at 08:02:16PM +0200, Thomas Gleixner wrote:
> > I'll fixup the check so it wont break the real deadlock case and queue
> > it.
> 
> How would the change break the real deadlock case?

> >  	/* Deadlock detection */
> >  	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > +		/*
> > +		 * If the prio chain has changed out from under us, set the task
> > +		 * to the current owner of the lock in the current waiter and
> > +		 * continue walking the prio chain
> > +		 */
> > +		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {

No, sorry. That's wrong.

Your change wreckages the rt_mutex_owner(lock) == top_task test
simply because in that case:

   (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task)

evaluates to true.

So we want this:

Index: tip/kernel/locking/rtmutex.c
===================================================================
--- tip.orig/kernel/locking/rtmutex.c
+++ tip/kernel/locking/rtmutex.c
@@ -375,6 +375,26 @@ static int rt_mutex_adjust_prio_chain(st
 	 * walk, we detected a deadlock.
 	 */
 	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
+		/*
+		 * If the prio chain has changed out from under us, set the task
+		 * to the current owner of the lock in the current waiter and
+		 * continue walking the prio chain
+		 */
+		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task &&
+		    rt_mutex_owner(lock) != top_task) {
+			/* Release the old owner */
+			raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+			put_task_struct(task);
+
+			/* Move to the new owner */
+			task = rt_mutex_owner(lock);
+			get_task_struct(task);
+
+			/* Let's try this again */
+			raw_spin_unlock(&lock->wait_lock);
+			goto retry;
+		}
+
 		debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
 		raw_spin_unlock(&lock->wait_lock);
 		ret = deadlock_detect ? -EDEADLK : 0;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 19:53             ` Thomas Gleixner
@ 2014-06-04 20:07               ` Brad Mouring
  2014-06-04 20:41                 ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Brad Mouring @ 2014-06-04 20:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Brad Mouring, Steven Rostedt, linux-rt-users, LKML,
	Peter Zijlstra, Ingo Molnar, Clark Williams

On Wed, Jun 04, 2014 at 09:53:16PM +0200, Thomas Gleixner wrote:
> On Wed, 4 Jun 2014, Brad Mouring wrote:
> > On Wed, Jun 04, 2014 at 08:02:16PM +0200, Thomas Gleixner wrote:
> > > I'll fixup the check so it wont break the real deadlock case and queue
> > > it.
> > 
> > How would the change break the real deadlock case?
> 
> > >  	/* Deadlock detection */
> > >  	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > > +		/*
> > > +		 * If the prio chain has changed out from under us, set the task
> > > +		 * to the current owner of the lock in the current waiter and
> > > +		 * continue walking the prio chain
> > > +		 */
> > > +		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {
> 
> No, sorry. That's wrong.
> 
> Your change wreckages the rt_mutex_owner(lock) == top_task test
> simply because in that case:
> 
>    (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task)
> 
> evaluates to true.

Ah. Yeah. I haven't tested this but it seems sane to me.

> 
> So we want this:
> 
> Index: tip/kernel/locking/rtmutex.c
> ===================================================================
> --- tip.orig/kernel/locking/rtmutex.c
> +++ tip/kernel/locking/rtmutex.c
> @@ -375,6 +375,26 @@ static int rt_mutex_adjust_prio_chain(st
>  	 * walk, we detected a deadlock.
>  	 */
>  	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> +		/*
> +		 * If the prio chain has changed out from under us, set the task
> +		 * to the current owner of the lock in the current waiter and
> +		 * continue walking the prio chain
> +		 */
> +		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task &&
> +		    rt_mutex_owner(lock) != top_task) {
> +			/* Release the old owner */
> +			raw_spin_unlock_irqrestore(&task->pi_lock, flags);
> +			put_task_struct(task);
> +
> +			/* Move to the new owner */
> +			task = rt_mutex_owner(lock);
> +			get_task_struct(task);
> +
> +			/* Let's try this again */
> +			raw_spin_unlock(&lock->wait_lock);
> +			goto retry;
> +		}
> +
>  		debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
>  		raw_spin_unlock(&lock->wait_lock);
>  		ret = deadlock_detect ? -EDEADLK : 0;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 20:07               ` Brad Mouring
@ 2014-06-04 20:41                 ` Thomas Gleixner
  2014-06-04 22:22                   ` [PATCH] " Brad Mouring
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2014-06-04 20:41 UTC (permalink / raw)
  To: Brad Mouring
  Cc: Steven Rostedt, linux-rt-users, LKML, Peter Zijlstra,
	Ingo Molnar, Clark Williams

On Wed, 4 Jun 2014, Brad Mouring wrote:
> On Wed, Jun 04, 2014 at 09:53:16PM +0200, Thomas Gleixner wrote:
> > Your change wreckages the rt_mutex_owner(lock) == top_task test
> > simply because in that case:
> > 
> >    (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task)
> > 
> > evaluates to true.
> 
> Ah. Yeah. I haven't tested this but it seems sane to me.
> 
> > 
> > So we want this:
> > 
> > Index: tip/kernel/locking/rtmutex.c
> > ===================================================================
> > --- tip.orig/kernel/locking/rtmutex.c
> > +++ tip/kernel/locking/rtmutex.c
> > @@ -375,6 +375,26 @@ static int rt_mutex_adjust_prio_chain(st
> >  	 * walk, we detected a deadlock.
> >  	 */
> >  	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > +		/*
> > +		 * If the prio chain has changed out from under us, set the task
> > +		 * to the current owner of the lock in the current waiter and
> > +		 * continue walking the prio chain
> > +		 */
> > +		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task &&
> > +		    rt_mutex_owner(lock) != top_task) {
> > +			/* Release the old owner */

That's not the old owner. It's the task which was blocked in the lock
chain before we dropped the locks and got preemptible. So we come here
just in the case that the task is now blocked on orig_lock.

We really want to be more than careful about the comments here. The
damned thing is complex enough already, so confusing comments are
actually worse than no comments.

Care to resend?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 18:12           ` Steven Rostedt
@ 2014-06-04 20:49             ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2014-06-04 20:49 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Brad Mouring, linux-rt-users, LKML, Peter Zijlstra, Ingo Molnar,
	Clark Williams

On Wed, 4 Jun 2014, Steven Rostedt wrote:

> On Wed, 4 Jun 2014 20:02:16 +0200 (CEST)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> 
>  
> > My brain still suffers from 3 days staring into futex.c
> 
> Hopefully that's not permanent damage.

Staring into rtmutex.c does not make it any better ...

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH] rtmutex: Handle when top lock owner changes
  2014-06-04 20:41                 ` Thomas Gleixner
@ 2014-06-04 22:22                   ` Brad Mouring
  2014-06-04 23:03                     ` Thomas Gleixner
  0 siblings, 1 reply; 23+ messages in thread
From: Brad Mouring @ 2014-06-04 22:22 UTC (permalink / raw)
  To: linux-rt-users
  Cc: Thomas Gleixner, Steven Rostedt, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Clark Williams, Brad Mouring

If, during walking the priority chain on a task blocking on a rtmutex,
and the task is examining the waiter blocked on the lock owned by a task
that is not blocking (the end of the chain), the current task is ejected
from the processor and the owner of the end lock is scheduled in,
releasing that lock, before the original task is scheduled back in, the
task misses the fact that the previous owner of the current lock no
longer holds it.

Consider the following scenario:
Tasks A, B, C, and D
Locks L1, L2, L3, and L4

D owns L4, C owns L3, B owns L2. C blocks on L4, B blocks on L3.

We have
L2->B->L3->C->L4->D

A comes along and blocks on L2.
A->L2->B->L3->C->L4->D

We walking the priority chain, and, while walking the chain, with
task pointing to D, top_waiter at C->L4. We fail to take L4's pi_lock
and are scheduled out.

Let's assume that the chain changes prior to A being scheduled in.
All of the owners finish with their locks and drop them. We have

A->L2

But, as things are still running, the chain can continue to change,
leading to

       A->L2->B
C->L1->D->L2

That is, B ends up winning L2, D blocks on L2 after grabbing L1,
and L1 blocks C. A is scheduled back in and continues the walk.

Since task was pointing to D, and D is indeed blocked, it will
have a waiter (D->L2), and, sadly, the lock is orig_lock. The
deadlock detection will come in and report a deadlock to userspace.

This change provides an additional check for this situation before
reporting a deadlock to userspace.

Signed-off-by: Brad Mouring <brad.mouring@ni.com>
Acked-by: Scot Salmon <scot.salmon@ni.com>
Acked-by: Ben Shelton <ben.shelton@ni.com>
Tested-by: Jeff Westfahl <jeff.westfahl@ni.com>
---
 kernel/locking/rtmutex.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index fbf152b..8ad7f7d 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -384,6 +384,26 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
 
 	/* Deadlock detection */
 	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
+		/*
+		 * If the prio chain has changed out from under us, set the task
+		 * to the current owner of the lock in the current waiter and
+		 * continue walking the prio chain
+		 */
+		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task &&
+			rt_mutex_owner(lock) != top_task) {
+			/* Release the old task (blocked before the chain chaged) */
+			raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+			put_task_struct(task);
+
+			/* Move to the owner of the lock now described in waiter */
+			task = rt_mutex_owner(lock);
+			get_task_struct(task);
+
+			/* Let's try this again */
+			raw_spin_unlock(&lock->wait_lock);
+			goto retry;
+		}
+
 		debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
 		raw_spin_unlock(&lock->wait_lock);
 		ret = deadlock_detect ? -EDEADLK : 0;
-- 
1.8.3-rc3


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] rtmutex: Handle when top lock owner changes
  2014-06-04 22:22                   ` [PATCH] " Brad Mouring
@ 2014-06-04 23:03                     ` Thomas Gleixner
  0 siblings, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2014-06-04 23:03 UTC (permalink / raw)
  To: Brad Mouring
  Cc: linux-rt-users, Steven Rostedt, linux-kernel, Peter Zijlstra,
	Ingo Molnar, Clark Williams, Brad Mouring


On Wed, 4 Jun 2014, Brad Mouring wrote:

> If, during walking the priority chain on a task blocking on a rtmutex,
> and the task is examining the waiter blocked on the lock owned by a task
> that is not blocking (the end of the chain), the current task is ejected
> from the processor and the owner of the end lock is scheduled in,
> releasing that lock, before the original task is scheduled back in, the
> task misses the fact that the previous owner of the current lock no
> longer holds it.

-ENOPARSE,
 
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index fbf152b..8ad7f7d 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -384,6 +384,26 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
>  
>  	/* Deadlock detection */

Does not apply against 3.15-rc8

>  	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> +		/*
> +		 * If the prio chain has changed out from under us, set the task
> +		 * to the current owner of the lock in the current waiter and
> +		 * continue walking the prio chain
> +		 */

You are still describing what the code is doing, not WHY.

    Why can it happen that the prio chain changed under us?

    Why do we set task to the current owner of the lock ?

    Why makes it sense to retry the chain walk from the very
    beginning?

    What are the conditions which make us go into that code path?

> +		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task &&
> +			rt_mutex_owner(lock) != top_task) {
> +			/* Release the old task (blocked before the chain chaged) */

chaged?

old task ?

Again:

> We really want to be more than careful about the comments here. The
> damned thing is complex enough already, so confusing comments are
> actually worse than no comments.

This is not a speed code contest. Take your time, think about it, run
it through your coworkers and then post again.

Last warning.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-04 15:32     ` Thomas Gleixner
  2014-06-04 15:44       ` Steven Rostedt
@ 2014-06-06  3:19       ` Steven Rostedt
  2014-06-06  5:40         ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Steven Rostedt @ 2014-06-06  3:19 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Brad Mouring, linux-rt-users, LKML, Peter Zijlstra, Ingo Molnar,
	Clark Williams

On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Tue, 3 Jun 2014, Steven Rostedt wrote:
> > On Fri, 23 May 2014 09:30:10 -0500
> > "Brad Mouring" <bmouring@ni.com> wrote:
> > >  	/* Deadlock detection */
> > >  	if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > > +		/*
> > > +		 * If the prio chain has changed out from under us, set the task
> > > +		 * to the current owner of the lock in the current waiter and
> > > +		 * continue walking the prio chain
> > > +		 */
> > > +		if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {
> 
> No, sorry. That's wrong.
> 
> Your change wreckages the rt_mutex_owner(lock) == top_task test
> simply because in that case:
> 
>    (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task)
> 
> evaluates to true.

I'm not so sure that's true. Because if this is a deadlock in the case
that rt_mutex_owner(lock) == top_task, then task == top_task should
also be true.

> 
> Aside of that we need to figure out whether the lock chain changed
> while we dropped the locks even in the non dead lock case. Otherwise
> we follow up the wrong chain there.
> 
> T0 blocked on L1 held by T1
> T1 blocked on L2 held by T2
> T2 blocked on L3 held by T3
> 
> So we walk the chain and do:
> 
>    T1 -> L2 -> T2
> 
> Now here we get preempted.
> 
> T3 releases L3
> T2 gets L3
> T2 drops L3 and L2
> T2 blocks on L4 held by T4
> T4 blocked on L5 held by T5
> 
> So we happily boost T4 and T5. Not what we really want to do.
> 
> Nasty, isn't it ?

As I stated before, it just wastes cycles. But looking at both your
next_lock code and this, I'm thinking we can simply break out if we
find that rt_mutex_owner(lock) != task. Because that means when we let
go of the locks, the current lock we are going up was released and
retaken. That would mean the chain walk should stop. It's similar to
the next lock being what we expected.

Perhaps something like this:

---
 kernel/locking/rtmutex.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Index: linux-rt.git/kernel/locking/rtmutex.c
===================================================================
--- linux-rt.git.orig/kernel/locking/rtmutex.c	2014-06-05 23:00:56.197855465 -0400
+++ linux-rt.git/kernel/locking/rtmutex.c	2014-06-05 23:14:44.164414857 -0400
@@ -284,7 +284,7 @@
 				      struct rt_mutex_waiter *orig_waiter,
 				      struct task_struct *top_task)
 {
-	struct rt_mutex *lock;
+	struct rt_mutex *lock = orig_lock;
 	struct rt_mutex_waiter *waiter, *top_waiter = orig_waiter;
 	int detect_deadlock, ret = 0, depth = 0;
 	unsigned long flags;
@@ -322,6 +322,16 @@
 	 */
 	raw_spin_lock_irqsave(&task->pi_lock, flags);
 
+	/*
+	 * When we dropped the spinlocks, if the owner of the lock we
+	 * are currently processing changed since we chain walked
+	 * to that lock, we are done with the chain walk. The previous
+	 * owner was obviously running to release it.
+	 */
+	if (lock && rt_mutex_owner(lock) != task)
+		goto out_unlock_pi;
+
+
 	waiter = task->pi_blocked_on;
 	/*
 	 * Check whether the end of the boosting chain has been

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-06  3:19       ` [PATCH 1/1] " Steven Rostedt
@ 2014-06-06  5:40         ` Thomas Gleixner
  2014-06-06  5:44           ` Thomas Gleixner
  2014-06-06  8:53           ` Steven Rostedt
  0 siblings, 2 replies; 23+ messages in thread
From: Thomas Gleixner @ 2014-06-06  5:40 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Brad Mouring, linux-rt-users, LKML, Peter Zijlstra, Ingo Molnar,
	Clark Williams

On Thu, 5 Jun 2014, Steven Rostedt wrote:
> On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
> Thomas Gleixner <tglx@linutronix.de> wrote:
> +	/*
> +	 * When we dropped the spinlocks, if the owner of the lock we
> +	 * are currently processing changed since we chain walked
> +	 * to that lock, we are done with the chain walk. The previous
> +	 * owner was obviously running to release it.
> +	 */
> +	if (lock && rt_mutex_owner(lock) != task)
> +		goto out_unlock_pi;

NO. You CANNOT derefernce lock after dropping the locks. It might be
gone already.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-06  5:40         ` Thomas Gleixner
@ 2014-06-06  5:44           ` Thomas Gleixner
  2014-06-06  8:53           ` Steven Rostedt
  1 sibling, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2014-06-06  5:44 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Brad Mouring, linux-rt-users, LKML, Peter Zijlstra, Ingo Molnar,
	Clark Williams

On Fri, 6 Jun 2014, Thomas Gleixner wrote:
> On Thu, 5 Jun 2014, Steven Rostedt wrote:
> > On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> > +	/*
> > +	 * When we dropped the spinlocks, if the owner of the lock we
> > +	 * are currently processing changed since we chain walked
> > +	 * to that lock, we are done with the chain walk. The previous
> > +	 * owner was obviously running to release it.
> > +	 */
> > +	if (lock && rt_mutex_owner(lock) != task)
> > +		goto out_unlock_pi;
> 
> NO. You CANNOT derefernce lock after dropping the locks. It might be
> gone already.

The only information we can check is, whether @task changed the
blocked on lock, really.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes
  2014-06-06  5:40         ` Thomas Gleixner
  2014-06-06  5:44           ` Thomas Gleixner
@ 2014-06-06  8:53           ` Steven Rostedt
  1 sibling, 0 replies; 23+ messages in thread
From: Steven Rostedt @ 2014-06-06  8:53 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Brad Mouring, linux-rt-users, LKML, Peter Zijlstra, Ingo Molnar,
	Clark Williams

On Fri, 6 Jun 2014 07:40:10 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, 5 Jun 2014, Steven Rostedt wrote:
> > On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
> > Thomas Gleixner <tglx@linutronix.de> wrote:
> > +	/*
> > +	 * When we dropped the spinlocks, if the owner of the lock we
> > +	 * are currently processing changed since we chain walked
> > +	 * to that lock, we are done with the chain walk. The previous
> > +	 * owner was obviously running to release it.
> > +	 */
> > +	if (lock && rt_mutex_owner(lock) != task)
> > +		goto out_unlock_pi;
> 
> NO. You CANNOT derefernce lock after dropping the locks. It might be
> gone already.
> 

Ah, the lock can be freed (out of memory that is). Why didn't you say
so in the first place ;-)

-- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2014-06-06  8:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-23 14:30 [PATCH 0/1] Faux deadlock detection fixup Brad Mouring
2014-05-23 14:30 ` [PATCH 1/1] rtmutex: Handle when top lock owner changes Brad Mouring
2014-06-04  1:06   ` Steven Rostedt
2014-06-04 13:05     ` Brad Mouring
2014-06-04 14:16       ` Steven Rostedt
2014-06-04 14:38         ` Brad Mouring
2014-06-04 14:58           ` Steven Rostedt
2014-06-04 15:11             ` Brad Mouring
2014-06-04 15:32     ` Thomas Gleixner
2014-06-04 15:44       ` Steven Rostedt
2014-06-04 18:02         ` Thomas Gleixner
2014-06-04 18:12           ` Steven Rostedt
2014-06-04 20:49             ` Thomas Gleixner
2014-06-04 19:25           ` Brad Mouring
2014-06-04 19:53             ` Thomas Gleixner
2014-06-04 20:07               ` Brad Mouring
2014-06-04 20:41                 ` Thomas Gleixner
2014-06-04 22:22                   ` [PATCH] " Brad Mouring
2014-06-04 23:03                     ` Thomas Gleixner
2014-06-06  3:19       ` [PATCH 1/1] " Steven Rostedt
2014-06-06  5:40         ` Thomas Gleixner
2014-06-06  5:44           ` Thomas Gleixner
2014-06-06  8:53           ` Steven Rostedt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.