linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Need lockdep help
@ 2007-12-02 19:45 Alan Stern
  2007-12-02 20:04 ` Arjan van de Ven
  2007-12-03 10:36 ` Jarek Poplawski
  0 siblings, 2 replies; 11+ messages in thread
From: Alan Stern @ 2007-12-02 19:45 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linux-pm mailing list, Kernel development list

Ingo:

I ran into a lockdep reporting issue just now with some new code under 
development.  I think it's a false positive; the question is how best 
to deal with it.

Here's the situation.  The new code runs during a system sleep (i.e., 
suspend or hibernation).  Certain activities have to be deferred during 
a system sleep, so I defined an rwsem: system_sleep_in_progress_rwsem.

Subroutines carrying out these activities acquire a read lock on the
rwsem, so normally they proceed with no hindrance.  During a sleep
transition, I acquire a write lock -- this is done via a PM-notifier
callout routine.  That is, during a PM_HIBERNATION_PREPARE or
PM_SUSPEND_PREPARE notification the routine does down_write(), and
during a PM_POST_HIBERNATION or PM_POST_SUSPEND notification the
routine does up_write().

The problem is that the notifier chain itself is under the control of 
an rwsem (to prevent the chain from being modified while it is in use).  
The resulting actions look like this:

System sleep start:
		down_read(notifier-chain rwsem);
		call the notifier routine
			down_write(&system_sleep_in_progress_rwsem);
		up_read(notifier-chain rwsem);

System sleep end:
		down_read(notifier-chain rwsem);
		call the notifier routine
			up_write(&system_sleep_in_progress_rwsem);
		up_read(notifier-chain rwsem);

This creates a lockdep violation; each rwsem in turn is locked while 
the other is being held.  However the only way this could lead to 
deadlock would be if there was already a bug in the system Power 
Management code (overlapping notifications).

Perhaps the system_sleep_in_progress_rwsem could be replaced by some 
other sort of synchronizing mechanism, but I don't want to change it -- 
an rwsem really does seem to be the correct thing to use here.

What do you suggest?

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Need lockdep help
  2007-12-02 19:45 Need lockdep help Alan Stern
@ 2007-12-02 20:04 ` Arjan van de Ven
  2007-12-02 20:08   ` Alan Stern
  2007-12-03 10:36 ` Jarek Poplawski
  1 sibling, 1 reply; 11+ messages in thread
From: Arjan van de Ven @ 2007-12-02 20:04 UTC (permalink / raw)
  To: Alan Stern; +Cc: Ingo Molnar, Linux-pm mailing list, Kernel development list

On Sun, 2 Dec 2007 14:45:32 -0500 (EST)
Alan Stern <stern@rowland.harvard.edu> wrote:

> Ingo:
> 
> I ran into a lockdep reporting issue just now with some new code
> under development.  I think it's a false positive; the question is
> how best to deal with it.
> 
> Here's the situation.  The new code runs during a system sleep (i.e., 
> suspend or hibernation).  Certain activities have to be deferred
> during a system sleep, so I defined an rwsem:
> system_sleep_in_progress_rwsem.
> 
> Subroutines carrying out these activities acquire a read lock on the
> rwsem, so normally they proceed with no hindrance.  During a sleep
> transition, I acquire a write lock -- this is done via a PM-notifier
> callout routine.  That is, during a PM_HIBERNATION_PREPARE or
> PM_SUSPEND_PREPARE notification the routine does down_write(), and
> during a PM_POST_HIBERNATION or PM_POST_SUSPEND notification the
> routine does up_write().
> 
> The problem is that the notifier chain itself is under the control of 
> an rwsem (to prevent the chain from being modified while it is in
> use). The resulting actions look like this:
> 
> System sleep start:
> 		down_read(notifier-chain rwsem);
> 		call the notifier routine
> 			down_write(&system_sleep_in_progress_rwsem);
> 		up_read(notifier-chain rwsem);
> 
> System sleep end:
> 		down_read(notifier-chain rwsem);
> 		call the notifier routine
> 			up_write(&system_sleep_in_progress_rwsem);
> 		up_read(notifier-chain rwsem);
> 
> This creates a lockdep violation; each rwsem in turn is locked while 
> the other is being held.  However the only way this could lead to 
> deadlock would be if there was already a bug in the system Power 
> Management code (overlapping notifications).

or.. modifications to the notifier chain while all this is happening
(remember: rwsems are fair, once a writer shows up, all readers wait)


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Need lockdep help
  2007-12-02 20:04 ` Arjan van de Ven
@ 2007-12-02 20:08   ` Alan Stern
  0 siblings, 0 replies; 11+ messages in thread
From: Alan Stern @ 2007-12-02 20:08 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Ingo Molnar, Linux-pm mailing list, Kernel development list

On Sun, 2 Dec 2007, Arjan van de Ven wrote:

> > This creates a lockdep violation; each rwsem in turn is locked while 
> > the other is being held.  However the only way this could lead to 
> > deadlock would be if there was already a bug in the system Power 
> > Management code (overlapping notifications).
> 
> or.. modifications to the notifier chain while all this is happening
> (remember: rwsems are fair, once a writer shows up, all readers wait)

But modifications to the notifier chain don't invoke the callout
routines.  Hence they won't try to lock the new rwsem and won't lead to
deadlock.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Need lockdep help
  2007-12-02 19:45 Need lockdep help Alan Stern
  2007-12-02 20:04 ` Arjan van de Ven
@ 2007-12-03 10:36 ` Jarek Poplawski
  2007-12-03 15:08   ` Alan Stern
  1 sibling, 1 reply; 11+ messages in thread
From: Jarek Poplawski @ 2007-12-03 10:36 UTC (permalink / raw)
  To: Alan Stern
  Cc: Ingo Molnar, Linux-pm mailing list, Kernel development list,
	Peter Zijlstra

On 02-12-2007 20:45, Alan Stern wrote:
> Ingo:
> 
> I ran into a lockdep reporting issue just now with some new code under 
> development.  I think it's a false positive; the question is how best 
> to deal with it.
> 
> Here's the situation.  The new code runs during a system sleep (i.e., 
> suspend or hibernation).  Certain activities have to be deferred during 
> a system sleep, so I defined an rwsem: system_sleep_in_progress_rwsem.
> 
> Subroutines carrying out these activities acquire a read lock on the
> rwsem, so normally they proceed with no hindrance.  During a sleep
> transition, I acquire a write lock -- this is done via a PM-notifier
> callout routine.  That is, during a PM_HIBERNATION_PREPARE or
> PM_SUSPEND_PREPARE notification the routine does down_write(), and
> during a PM_POST_HIBERNATION or PM_POST_SUSPEND notification the
> routine does up_write().
> 
> The problem is that the notifier chain itself is under the control of 
> an rwsem (to prevent the chain from being modified while it is in use).  
> The resulting actions look like this:
> 
> System sleep start:
> 		down_read(notifier-chain rwsem);
> 		call the notifier routine
> 			down_write(&system_sleep_in_progress_rwsem);
> 		up_read(notifier-chain rwsem);
> 
> System sleep end:
> 		down_read(notifier-chain rwsem);
> 		call the notifier routine
> 			up_write(&system_sleep_in_progress_rwsem);
> 		up_read(notifier-chain rwsem);
> 
> This creates a lockdep violation; each rwsem in turn is locked while 
> the other is being held.  However the only way this could lead to 
> deadlock would be if there was already a bug in the system Power 
> Management code (overlapping notifications).

Actually, IMHO, there is no reason for any lockdep violation:

thread #1: has down_read(A); waits for #2 to down_write(B)
thread #2: has down_write(B); never waits for #1 to down_read(A)

So, deadlock isn't possible here. If lockdep reports something else it
should be fixed (and you'd be right to omit lockdep until this is
done).

Regards,
Jarek P.

PS: Peter Zijlstra added to CC

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Need lockdep help
  2007-12-03 10:36 ` Jarek Poplawski
@ 2007-12-03 15:08   ` Alan Stern
  2007-12-03 23:25     ` Jarek Poplawski
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2007-12-03 15:08 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Ingo Molnar, Linux-pm mailing list, Kernel development list,
	Peter Zijlstra

On Mon, 3 Dec 2007, Jarek Poplawski wrote:

> > System sleep start:
> > 		down_read(notifier-chain rwsem);
> > 		call the notifier routine
> > 			down_write(&system_sleep_in_progress_rwsem);
> > 		up_read(notifier-chain rwsem);
> > 
> > System sleep end:
> > 		down_read(notifier-chain rwsem);
> > 		call the notifier routine
> > 			up_write(&system_sleep_in_progress_rwsem);
> > 		up_read(notifier-chain rwsem);
> > 
> > This creates a lockdep violation; each rwsem in turn is locked while 
> > the other is being held.  However the only way this could lead to 
> > deadlock would be if there was already a bug in the system Power 
> > Management code (overlapping notifications).
> 
> Actually, IMHO, there is no reason for any lockdep violation:
> 
> thread #1: has down_read(A); waits for #2 to down_write(B)
> thread #2: has down_write(B); never waits for #1 to down_read(A)
> 
> So, deadlock isn't possible here. If lockdep reports something else it
> should be fixed (and you'd be right to omit lockdep until this is
> done).

I think the reasoning goes the way Arjan described.  Suppose in between
#1 and #2 there is thread #3 trying to do down_write(A) and waiting for
#1.  Then thread #2 doesn't have to wait for #1 directly, but it would
have to wait for #3.

In my case the simplest answer appears to be the replace the rwsem
with something slightly more complicated (a mutex plus a boolean flag 
-- the loss of concurrency won't matter much since it isn't on a hot 
path).

Thanks for the comment.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Need lockdep help
  2007-12-03 15:08   ` Alan Stern
@ 2007-12-03 23:25     ` Jarek Poplawski
  2007-12-04 15:17       ` Alan Stern
  0 siblings, 1 reply; 11+ messages in thread
From: Jarek Poplawski @ 2007-12-03 23:25 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jarek Poplawski, Ingo Molnar, Linux-pm mailing list,
	Kernel development list, Peter Zijlstra

Alan Stern wrote, On 12/03/2007 04:08 PM:

> On Mon, 3 Dec 2007, Jarek Poplawski wrote:
> 
>>> System sleep start:
>>> 		down_read(notifier-chain rwsem);
>>> 		call the notifier routine
>>> 			down_write(&system_sleep_in_progress_rwsem);
>>> 		up_read(notifier-chain rwsem);
>>>
>>> System sleep end:
>>> 		down_read(notifier-chain rwsem);
>>> 		call the notifier routine
>>> 			up_write(&system_sleep_in_progress_rwsem);
>>> 		up_read(notifier-chain rwsem);
>>>
>>> This creates a lockdep violation; each rwsem in turn is locked while 
>>> the other is being held.  However the only way this could lead to 
>>> deadlock would be if there was already a bug in the system Power 
>>> Management code (overlapping notifications).
>> Actually, IMHO, there is no reason for any lockdep violation:
>>
>> thread #1: has down_read(A); waits for #2 to down_write(B)
>> thread #2: has down_write(B); never waits for #1 to down_read(A)
>>
>> So, deadlock isn't possible here. If lockdep reports something else it
>> should be fixed (and you'd be right to omit lockdep until this is
>> done).
> 
> I think the reasoning goes the way Arjan described.  Suppose in between
> #1 and #2 there is thread #3 trying to do down_write(A) and waiting for
> #1.  Then thread #2 doesn't have to wait for #1 directly, but it would
> have to wait for #3.


As a matter of fact I completely missed Arjan's point because I thought
you described these locks according to lockdep's report, and there is
an information about read or write... Since, you still seem to guess
above about possible scenario, maybe it would be easier to show this
report (and maybe a piece of this code if possible)?

Btw., if it were like you're suggesting, it still shouldn't make any
difference: if thread #3 is only waiting for the lock taken for reading,
then I can't see why thread #2 has to wait for anything. Probably more
dangerous, at least for lockdep, could look taking down_write(A) by
thread #1 in between, but if it all were possible only within one
thread, then still there should be no reason to change good program
only to please lockdep.
 

> In my case the simplest answer appears to be the replace the rwsem
> with something slightly more complicated (a mutex plus a boolean flag 
> -- the loss of concurrency won't matter much since it isn't on a hot 
> path).

I'm not sure I can understand your plan, but I doubt there should be
such problems with taking rwsem for sleeping, so maybe it would be
better to figure out what really scares lockdep, to fix the right place?

Jarek P.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Need lockdep help
  2007-12-03 23:25     ` Jarek Poplawski
@ 2007-12-04 15:17       ` Alan Stern
  2007-12-04 19:02         ` Jarek Poplawski
  2007-12-04 21:00         ` Ingo Molnar
  0 siblings, 2 replies; 11+ messages in thread
From: Alan Stern @ 2007-12-04 15:17 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Jarek Poplawski, Ingo Molnar, Linux-pm mailing list,
	Kernel development list, Peter Zijlstra

On Tue, 4 Dec 2007, Jarek Poplawski wrote:

> Alan Stern wrote, On 12/03/2007 04:08 PM:
> 
> > On Mon, 3 Dec 2007, Jarek Poplawski wrote:
> > 

> >> Actually, IMHO, there is no reason for any lockdep violation:
> >>
> >> thread #1: has down_read(A); waits for #2 to down_write(B)
> >> thread #2: has down_write(B); never waits for #1 to down_read(A)
> >>
> >> So, deadlock isn't possible here. If lockdep reports something else it
> >> should be fixed (and you'd be right to omit lockdep until this is
> >> done).
> > 
> > I think the reasoning goes the way Arjan described.  Suppose in between
> > #1 and #2 there is thread #3 trying to do down_write(A) and waiting for
> > #1.  Then thread #2 doesn't have to wait for #1 directly, but it would
> > have to wait for #3.
> 
> 
> As a matter of fact I completely missed Arjan's point because I thought
> you described these locks according to lockdep's report, and there is
> an information about read or write... Since, you still seem to guess
> above about possible scenario, maybe it would be easier to show this
> report (and maybe a piece of this code if possible)?

I have changed my new code, so now the problem is avoided.

> Btw., if it were like you're suggesting, it still shouldn't make any
> difference: if thread #3 is only waiting for the lock taken for reading,
> then I can't see why thread #2 has to wait for anything.

The deadlock occurs because:

	Thread #1 has locked A for reading and is waiting for
	#2 to release B so that #1 can lock B for writing.

	Thread #3 is waiting for #1 to release A so that #3
	can lock A for writing.

	Thread #2 has locked B for writing and is waiting for
	#3 to finish using A so that #2 can lock A for reading.

So #1 is waiting for #2, #2 is waiting for #3, and #3 is waiting for 
#1.  Lockdep warns because this might happen.

Perhaps the point you're missing is that if #3 does down_write(A)  
before #2 does down_read(A) then #2 has to wait until #3 releases the
write lock.  #2 isn't allowed to sneak in front of #3; that's what
Arjan meant when he said that rwsems are fair.

The same sort of problem occurs when a thread does nested read locks on
a single rwsem.  If a different thread were to try to acquire a write
lock in between, another deadlock would occur.  That's why lockdep
warns about nested (or recursive) read locks.

> Probably more
> dangerous, at least for lockdep, could look taking down_write(A) by
> thread #1 in between, but if it all were possible only within one
> thread, then still there should be no reason to change good program
> only to please lockdep.

I believe the idea behind lockdep is to warn about situations that
might reasonably be expected to occur.  A thread deadlocking itself by
doing down_read(A) followed immediately by down_write(A) isn't
reasonable, so lockdep doesn't worry about that possibility.

> > In my case the simplest answer appears to be the replace the rwsem
> > with something slightly more complicated (a mutex plus a boolean flag 
> > -- the loss of concurrency won't matter much since it isn't on a hot 
> > path).
> 
> I'm not sure I can understand your plan, but I doubt there should be
> such problems with taking rwsem for sleeping, so maybe it would be
> better to figure out what really scares lockdep, to fix the right place?

The real problem is that lockdep has to make some generalizations.  It
can't be aware of the details of every possible situation, and it
doesn't have a global view of the entire kernel, so it doesn't know
when special circumstances make deadlock impossible.

Furthermore, in this case deadlock isn't really impossible -- it could 
occur if there were a bug somewhere else in the kernel.  So lockdep was 
correct to warn that deadlock might occur.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Need lockdep help
  2007-12-04 15:17       ` Alan Stern
@ 2007-12-04 19:02         ` Jarek Poplawski
  2007-12-04 19:28           ` Alan Stern
  2007-12-04 21:00         ` Ingo Molnar
  1 sibling, 1 reply; 11+ messages in thread
From: Jarek Poplawski @ 2007-12-04 19:02 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jarek Poplawski, Ingo Molnar, Linux-pm mailing list,
	Kernel development list, Peter Zijlstra

Alan Stern wrote, On 12/04/2007 04:17 PM:
...

> Furthermore, in this case deadlock isn't really impossible -- it could 
> occur if there were a bug somewhere else in the kernel.  So lockdep was 
> correct to warn that deadlock might occur.


Alan, if the scenario was like you described at the beginning, there was
no deadlock possible, unless some errors in the notifier. These #1-#3
threads were only helpful to guess what lockdep could 'think', but I
guess notifier doesn't use 2 rivaling threads for a wake, so, lockdep
probably needed additional information. And you really can't consider
any hypothetical kernel bugs here because then each lock is vulnerable.

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Need lockdep help
  2007-12-04 19:02         ` Jarek Poplawski
@ 2007-12-04 19:28           ` Alan Stern
  2007-12-04 20:14             ` Jarek Poplawski
  0 siblings, 1 reply; 11+ messages in thread
From: Alan Stern @ 2007-12-04 19:28 UTC (permalink / raw)
  To: Jarek Poplawski
  Cc: Jarek Poplawski, Ingo Molnar, Linux-pm mailing list,
	Kernel development list, Peter Zijlstra

On Tue, 4 Dec 2007, Jarek Poplawski wrote:

> Alan Stern wrote, On 12/04/2007 04:17 PM:
> ...
> 
> > Furthermore, in this case deadlock isn't really impossible -- it could 
> > occur if there were a bug somewhere else in the kernel.  So lockdep was 
> > correct to warn that deadlock might occur.
> 
> 
> Alan, if the scenario was like you described at the beginning, there was
> no deadlock possible, unless some errors in the notifier.

Or errors in the notifier's caller.

> These #1-#3
> threads were only helpful to guess what lockdep could 'think', but I
> guess notifier doesn't use 2 rivaling threads for a wake, so, lockdep
> probably needed additional information. And you really can't consider
> any hypothetical kernel bugs here because then each lock is vulnerable.

But you have to consider hypothetical kernel bugs.  That's exactly what 
lockdep is for -- to warn you about possible deadlocks that could be 
caused by bugs.

As a simple example, if thread #1 does "lock(A); lock(B)" and thread 
#2 does "lock(B); lock(A)" then there's a possible bug.  Lockdep should 
warn about you, and it does -- even if those two threads can never run 
at the same time.

If lockdep warned about deadlocks only when they actually happened, it 
wouldn't be nearly so useful.

Alan Stern


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Need lockdep help
  2007-12-04 19:28           ` Alan Stern
@ 2007-12-04 20:14             ` Jarek Poplawski
  0 siblings, 0 replies; 11+ messages in thread
From: Jarek Poplawski @ 2007-12-04 20:14 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jarek Poplawski, Ingo Molnar, Linux-pm mailing list,
	Kernel development list, Peter Zijlstra

Alan Stern wrote, On 12/04/2007 08:28 PM:

> On Tue, 4 Dec 2007, Jarek Poplawski wrote:

...

> But you have to consider hypothetical kernel bugs.  That's exactly what 
> lockdep is for -- to warn you about possible deadlocks that could be 
> caused by bugs.
> 
> As a simple example, if thread #1 does "lock(A); lock(B)" and thread 
> #2 does "lock(B); lock(A)" then there's a possible bug.  Lockdep should 
> warn about you, and it does -- even if those two threads can never run 
> at the same time.
> 
> If lockdep warned about deadlocks only when they actually happened, it 
> wouldn't be nearly so useful.


Sure! I probably missed your point... Lockdep always names reported locks,
so I meant 'hypothetical' only trying to explain lockdep with some other,
unknown or unnamed bugs.

So, depending on the code, above example with A & B could be a real bug
(even if very improbable but logically justified) or a false alarm (eg.
when we know both threads could never work at the same).

Jarek P.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Need lockdep help
  2007-12-04 15:17       ` Alan Stern
  2007-12-04 19:02         ` Jarek Poplawski
@ 2007-12-04 21:00         ` Ingo Molnar
  1 sibling, 0 replies; 11+ messages in thread
From: Ingo Molnar @ 2007-12-04 21:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jarek Poplawski, Jarek Poplawski, Linux-pm mailing list,
	Kernel development list, Peter Zijlstra


* Alan Stern <stern@rowland.harvard.edu> wrote:

> > I'm not sure I can understand your plan, but I doubt there should be 
> > such problems with taking rwsem for sleeping, so maybe it would be 
> > better to figure out what really scares lockdep, to fix the right 
> > place?
> 
> The real problem is that lockdep has to make some generalizations.  It 
> can't be aware of the details of every possible situation, and it 
> doesn't have a global view of the entire kernel, so it doesn't know 
> when special circumstances make deadlock impossible.
> 
> Furthermore, in this case deadlock isn't really impossible -- it could 
> occur if there were a bug somewhere else in the kernel.  So lockdep 
> was correct to warn that deadlock might occur.

ok. Thanks for resolving this by working it around - i _think_ in the 
long run we'd like to make all locking "simpler" - not just for the sake 
of lockdep, but also for the sake of human reviewability. So in that 
sense, even though in this particular case you are fully right that 
lockdep was wrong, we benefit long-term.

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-12-04 21:00 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-02 19:45 Need lockdep help Alan Stern
2007-12-02 20:04 ` Arjan van de Ven
2007-12-02 20:08   ` Alan Stern
2007-12-03 10:36 ` Jarek Poplawski
2007-12-03 15:08   ` Alan Stern
2007-12-03 23:25     ` Jarek Poplawski
2007-12-04 15:17       ` Alan Stern
2007-12-04 19:02         ` Jarek Poplawski
2007-12-04 19:28           ` Alan Stern
2007-12-04 20:14             ` Jarek Poplawski
2007-12-04 21:00         ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).