linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* lockdep false positive in double_lock_balance()?
       [not found] ` <alpine.LSU.2.00.1204251411460.4188@eggly.anvils>
@ 2012-05-16 20:00   ` Roland Dreier
  2012-05-17 19:19     ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Roland Dreier @ 2012-05-16 20:00 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar; +Cc: linux-kernel

Hi scheduler hackers,

I'm very occasionally seeing the lockdep warning below on our boxes
running 2.6.39 (PREEMPT=n, so "unfair" _double_lock_balance()).  I
think I see the explanation, and it's probably not even worth fixing:

On the unlock side, we have:

	static inline void double_unlock_balance(struct rq *this_rq, struct rq *busiest)
		__releases(busiest->lock)
	{
		raw_spin_unlock(&busiest->lock);
		lock_set_subclass(&this_rq->lock.dep_map, 0, _RET_IP_);
	}

while on the lock side we have:

	static int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
		__releases(this_rq->lock)
		__acquires(busiest->lock)
		__acquires(this_rq->lock)
	{
		int ret = 0;

		if (unlikely(!raw_spin_trylock(&busiest->lock))) {
			if (busiest < this_rq) {
				raw_spin_unlock(&this_rq->lock);
				raw_spin_lock(&busiest->lock);
				raw_spin_lock_nested(&this_rq->lock,
						      SINGLE_DEPTH_NESTING);
				ret = 1;
			} else
				raw_spin_lock_nested(&busiest->lock,
						      SINGLE_DEPTH_NESTING);
		}
		return ret;
	}

So it seems we have the following (purely lockdep-related) race:

    unlock:				lock:

					if (unlikely(!raw_spin_trylock(&busiest->lock))) { //fail to lock

    raw_spin_unlock(&busiest->lock);

						if (busiest < this_rq) { //not true
						} else
							raw_spin_lock_nested(&busiest->lock,
									      SINGLE_DEPTH_NESTING);

    lock_set_subclass(&this_rq->lock.dep_map, 0, _RET_IP_); //too late

where we end up trying to take a second lock with SINGLE_DEPTH_NESTING
before we've promoted our first lock to subclass 0.

I'm not sure this is easily fixable (if we flipped the
lock_set_subclass() to before the raw_spin_unlock(), would we
introduce other lockdep problems?).  Mostly I want to make sure
my diagnosis is correct and there's not some actual deadlock
that could happen.

So does this make sense?

Thanks,
  Roland

Here's the actual lockdep warning:

[89945.638847] =============================================
[89945.638974] [ INFO: possible recursive locking detected ]
[89945.639033] 2.6.39.3-dbg+ #13245
[89945.639079] ---------------------------------------------
[89945.639131] foed/7820 is trying to acquire lock:
[89945.639180]  (&rq->lock/1){..-.-.}, at: [<ffffffff8103fa1a>] double_lock_balance+0x5a/0x90
[89945.639291] 
[89945.639292] but task is already holding lock:
[89945.639374]  (&rq->lock/1){..-.-.}, at: [<ffffffff8103fa3b>] double_lock_balance+0x7b/0x90
[89945.639480] 
[89945.639481] other info that might help us debug this:
[89945.639566] 2 locks held by foed/7820:
[89945.639611]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8156a445>] do_page_fault+0x1f5/0x560
[89945.639717]  #1:  (&rq->lock/1){..-.-.}, at: [<ffffffff8103fa3b>] double_lock_balance+0x7b/0x90
[89945.639829] 
[89945.639830] stack backtrace:
[89945.639908] Pid: 7820, comm: foed Tainted: G        W   2.6.39.3-dbg+ #13245
[89945.639965] Call Trace:
[89945.640011]  [<ffffffff8109293f>] __lock_acquire+0x153f/0x1da0
[89945.640069]  [<ffffffff81009235>] ? native_sched_clock+0x15/0x70
[89945.640126]  [<ffffffff810812ef>] ? local_clock+0x6f/0x80
[89945.640179]  [<ffffffff8103fa3b>] ? double_lock_balance+0x7b/0x90
[89945.640235]  [<ffffffff8109384d>] lock_acquire+0x9d/0x130
[89945.640288]  [<ffffffff8103fa1a>] ? double_lock_balance+0x5a/0x90
[89945.640345]  [<ffffffff810fe47f>] ? cpupri_find+0xcf/0x140
[89945.640401]  [<ffffffff81565ce4>] _raw_spin_lock_nested+0x34/0x70
[89945.640456]  [<ffffffff8103fa1a>] ? double_lock_balance+0x5a/0x90
[89945.640512]  [<ffffffff8103fa1a>] double_lock_balance+0x5a/0x90
[89945.640568]  [<ffffffff8104c546>] push_rt_task+0xc6/0x290
[89945.640621]  [<ffffffff8104c7c0>] push_rt_tasks+0x20/0x30
[89945.640674]  [<ffffffff8104c83e>] post_schedule_rt+0xe/0x10
[89945.640729]  [<ffffffff81562d8c>] schedule+0x53c/0xa70
[89945.640781]  [<ffffffff810941c8>] ? mark_held_locks+0x78/0xb0
[89945.640781]  [<ffffffff810941c8>] ? mark_held_locks+0x78/0xb0
[89945.640836]  [<ffffffff81565b15>] rwsem_down_failed_common+0xb5/0x150
[89945.640893]  [<ffffffff8108122d>] ? sched_clock_cpu+0xbd/0x110
[89945.640949]  [<ffffffff8108e33d>] ? trace_hardirqs_off+0xd/0x10
[89945.641004]  [<ffffffff81565be5>] rwsem_down_read_failed+0x15/0x17
[89945.641061]  [<ffffffff812c0a54>] call_rwsem_down_read_failed+0x14/0x30
[89945.641120]  [<ffffffff81139565>] ? sys_mprotect+0xe5/0x230
[89945.641174]  [<ffffffff81564f1e>] ? down_read+0x7e/0xa0
[89945.641227]  [<ffffffff8156a445>] ? do_page_fault+0x1f5/0x560
[89945.641281]  [<ffffffff8156a445>] do_page_fault+0x1f5/0x560
[89945.641336]  [<ffffffff812bb5fc>] ? rwsem_wake+0x4c/0x60
[89945.641388]  [<ffffffff812c0aa7>] ? call_rwsem_wake+0x17/0x30
[89945.641443]  [<ffffffff812c0b5d>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[89945.641501]  [<ffffffff81566d25>] page_fault+0x25/0x30

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lockdep false positive in double_lock_balance()?
  2012-05-16 20:00   ` lockdep false positive in double_lock_balance()? Roland Dreier
@ 2012-05-17 19:19     ` Peter Zijlstra
  2012-05-18 14:43       ` Steven Rostedt
                         ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Peter Zijlstra @ 2012-05-17 19:19 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Ingo Molnar, linux-kernel, rostedt

On Wed, 2012-05-16 at 13:00 -0700, Roland Dreier wrote:
> Hi scheduler hackers,
> 
> I'm very occasionally seeing the lockdep warning below on our boxes
> running 2.6.39 (PREEMPT=n, so "unfair" _double_lock_balance()).  I
> think I see the explanation, and it's probably not even worth fixing:
> 
> On the unlock side, we have:

<snip>

> while on the lock side we have:

<snip>

> So it seems we have the following (purely lockdep-related) race:
> 
>     unlock:				lock:
> 
> 					if (unlikely(!raw_spin_trylock(&busiest->lock))) { //fail to lock
> 
>     raw_spin_unlock(&busiest->lock);
> 
> 						if (busiest < this_rq) { //not true
> 						} else
> 							raw_spin_lock_nested(&busiest->lock,
> 									      SINGLE_DEPTH_NESTING);
> 
>     lock_set_subclass(&this_rq->lock.dep_map, 0, _RET_IP_); //too late
> 
> where we end up trying to take a second lock with SINGLE_DEPTH_NESTING
> before we've promoted our first lock to subclass 0.

*phew* you actually made me think there ;-)

Anyway, it all sounds very plausible, which is what threw me, but its
wrong :-)

The race you describe exists, except that's not how lockdep works. Both
cpu's would have a different task (one would hope to presume) and the
held lock stack is per task. So even if busiest_rq on cpu1 (lock case)
is the same lock as this_rq on cpu0 (unlock case), they're in different
stacks with different states.

> So does this make sense?

Almost :-)

> Here's the actual lockdep warning:

> [89945.640512]  [<ffffffff8103fa1a>] double_lock_balance+0x5a/0x90
> [89945.640568]  [<ffffffff8104c546>] push_rt_task+0xc6/0x290

this is the clue.. if you look at that code you'll find the
double_lock_balance() in question is the one in find_lock_lowest_rq()
[yay for inlining].

Now find_lock_lowest_rq() has a bug.. it fails to use
double_unlock_balance() in one exit path, if this results in a retry in
push_rt_task() we'll call double_lock_balance() again, at which point
we'll run into said issue.

Presumably this is all rather rare..

Something like this should fix it I think..


---
 kernel/sched/rt.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index c5565c3..b649108 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1556,7 +1556,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 				     task_running(rq, task) ||
 				     !task->on_rq)) {
 
-				raw_spin_unlock(&lowest_rq->lock);
+				double_unlock_balance(rq, lowest_rq);
 				lowest_rq = NULL;
 				break;
 			}


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: lockdep false positive in double_lock_balance()?
  2012-05-17 19:19     ` Peter Zijlstra
@ 2012-05-18 14:43       ` Steven Rostedt
  2012-05-18 16:42       ` Roland Dreier
  2012-06-06 15:54       ` [tip:sched/urgent] sched/rt: Fix lockdep annotation within find_lock_lowest_rq() tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 5+ messages in thread
From: Steven Rostedt @ 2012-05-18 14:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Roland Dreier, Ingo Molnar, linux-kernel

On Thu, 2012-05-17 at 21:19 +0200, Peter Zijlstra wrote:

> Something like this should fix it I think..
> 
> 
> ---
>  kernel/sched/rt.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
> index c5565c3..b649108 100644
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -1556,7 +1556,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
>  				     task_running(rq, task) ||
>  				     !task->on_rq)) {
>  
> -				raw_spin_unlock(&lowest_rq->lock);
> +				double_unlock_balance(rq, lowest_rq);

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

>  				lowest_rq = NULL;
>  				break;
>  			}



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lockdep false positive in double_lock_balance()?
  2012-05-17 19:19     ` Peter Zijlstra
  2012-05-18 14:43       ` Steven Rostedt
@ 2012-05-18 16:42       ` Roland Dreier
  2012-06-06 15:54       ` [tip:sched/urgent] sched/rt: Fix lockdep annotation within find_lock_lowest_rq() tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 5+ messages in thread
From: Roland Dreier @ 2012-05-18 16:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ingo Molnar, linux-kernel, rostedt

> *phew* you actually made me think there ;-)

Sorry for that.

> The race you describe exists, except that's not how lockdep works. Both
> cpu's would have a different task (one would hope to presume) and the
> held lock stack is per task. So even if busiest_rq on cpu1 (lock case)
> is the same lock as this_rq on cpu0 (unlock case), they're in different
> stacks with different states.

Got it.  I confused myself between user processes (which might have
multiple threads on multiple CPUs) and kernel tasks.

> Something like this should fix it I think..

Thanks, I'll throw this in and let you know if we see this again.
But it's very rare so hard to know one way or another for quite
a while.

 - R.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip:sched/urgent] sched/rt: Fix lockdep annotation within find_lock_lowest_rq()
  2012-05-17 19:19     ` Peter Zijlstra
  2012-05-18 14:43       ` Steven Rostedt
  2012-05-18 16:42       ` Roland Dreier
@ 2012-06-06 15:54       ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 5+ messages in thread
From: tip-bot for Peter Zijlstra @ 2012-06-06 15:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, rostedt, a.p.zijlstra, roland, tglx

Commit-ID:  7f1b43936f0ecad14770634c021cf4a929aec74d
Gitweb:     http://git.kernel.org/tip/7f1b43936f0ecad14770634c021cf4a929aec74d
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Thu, 17 May 2012 21:19:46 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Wed, 6 Jun 2012 16:52:26 +0200

sched/rt: Fix lockdep annotation within find_lock_lowest_rq()

Roland Dreier reported spurious, hard to trigger lockdep warnings
within the scheduler - without any real lockup.

This bit gives us the right clue:

> [89945.640512]  [<ffffffff8103fa1a>] double_lock_balance+0x5a/0x90
> [89945.640568]  [<ffffffff8104c546>] push_rt_task+0xc6/0x290

if you look at that code you'll find the double_lock_balance() in
question is the one in find_lock_lowest_rq() [yay for inlining].

Now find_lock_lowest_rq() has a bug.. it fails to use
double_unlock_balance() in one exit path, if this results in a retry in
push_rt_task() we'll call double_lock_balance() again, at which point
we'll run into said lockdep confusion.

Reported-by: Roland Dreier <roland@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337282386.4281.77.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/rt.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 2a4e8df..573e1ca 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1562,7 +1562,7 @@ static struct rq *find_lock_lowest_rq(struct task_struct *task, struct rq *rq)
 				     task_running(rq, task) ||
 				     !task->on_rq)) {
 
-				raw_spin_unlock(&lowest_rq->lock);
+				double_unlock_balance(rq, lowest_rq);
 				lowest_rq = NULL;
 				break;
 			}

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-06-06 15:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1335314100-532-1-git-send-email-roland@kernel.org>
     [not found] ` <alpine.LSU.2.00.1204251411460.4188@eggly.anvils>
2012-05-16 20:00   ` lockdep false positive in double_lock_balance()? Roland Dreier
2012-05-17 19:19     ` Peter Zijlstra
2012-05-18 14:43       ` Steven Rostedt
2012-05-18 16:42       ` Roland Dreier
2012-06-06 15:54       ` [tip:sched/urgent] sched/rt: Fix lockdep annotation within find_lock_lowest_rq() tip-bot for Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).