linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Modeling wait events with Lockdep
@ 2022-06-30 23:05 Ioannis Angelakopoulos
  2022-07-01 11:59 ` Jan Kara
  0 siblings, 1 reply; 3+ messages in thread
From: Ioannis Angelakopoulos @ 2022-06-30 23:05 UTC (permalink / raw)
  To: mingo, jack; +Cc: boris, josef, linux-kernel

Hello,

I would like to ask some questions regarding modeling waiting for events 
(i.e the wait_event) in Linux using Lockdep.
I am trying to model these events in btrfs since there are deadlocks 
detected involving waiting for events and Lockdep is not currently able 
to address them (e.g., 
https://lore.kernel.org/linux-btrfs/cover.1655147296.git.josef@toxicpanda.com/).

I am very new to Lockdep so I would like to know, what would be the 
correct way of implementing these models using Lockdep?
I noticed that JBD2 uses a read-write lockdep map. It takes the read 
lockdep map when it creates a transaction handle and unlocks the read 
lockdep map when it frees the handle. Also, every time the thread has to 
wait for resources (e.g., transaction credits) and the handle is not 
supposed to be alive, the thread locks and unlocks immediately the write 
lockdep map before the waiting event (maybe I understood something wrong 
here?). Is this the only Lockdep model that can be used for these 
waiting events?

For your reference, here are 2 examples that we are trying to annotate 
with Lockdep and we would like to know if we are on the correct track.

In the first example it makes sense to use the JBD2 model, however we 
are not sure how to apply the model in the second case. The comments 
indicate our concerns.

------------------------------
Simple Case:

TA
rwsem_acquire_read(lockdep_map);
cond=false
do_work()
cond=true
rwsem_release_read(lockdep_map);
signal(w)

TB
rwsem_acquire(lockdep_map);
rswem_release(lockdep_map);
wait_event(w, cond==true)

Advanced Case:

TA
rwsem_acquire_read(lockdep_map)
cond=false
// exits while holding the lock

TB
cond=true
rwsem_release_read(lockdep_map) // We do not know that we hold the lock
signal(w)

TC
rwsem_acquire(lockdep_map);
rswem_release(lockdep_map);
wait_event(w, cond==true)

------------------------------

I am aware that DEPT (Dependency Tracker) which targets waiting for 
events in Linux is under development.
However, I need to implement these Lockdep models to aid the btrfs 
developers to catch existing deadlocks until DEPT is mature enough to be 
used.

Thanks,
Ioannis

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Modeling wait events with Lockdep
  2022-06-30 23:05 Modeling wait events with Lockdep Ioannis Angelakopoulos
@ 2022-07-01 11:59 ` Jan Kara
  2022-07-06  0:51   ` Ioannis Angelakopoulos
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Kara @ 2022-07-01 11:59 UTC (permalink / raw)
  To: Ioannis Angelakopoulos; +Cc: mingo, jack, boris, josef, linux-kernel

Hello!

On Thu 30-06-22 23:05:07, Ioannis Angelakopoulos wrote:
> I would like to ask some questions regarding modeling waiting for events 
> (i.e the wait_event) in Linux using Lockdep.
> I am trying to model these events in btrfs since there are deadlocks 
> detected involving waiting for events and Lockdep is not currently able 
> to address them (e.g., 
> https://lore.kernel.org/linux-btrfs/cover.1655147296.git.josef@toxicpanda.com/).
> 
> I am very new to Lockdep so I would like to know, what would be the 
> correct way of implementing these models using Lockdep?
>
> I noticed that JBD2 uses a read-write lockdep map. It takes the read 
> lockdep map when it creates a transaction handle and unlocks the read 
> lockdep map when it frees the handle. Also, every time the thread has to 
> wait for resources (e.g., transaction credits) and the handle is not 
> supposed to be alive, the thread locks and unlocks immediately the write 
> lockdep map before the waiting event (maybe I understood something wrong 
> here?).

No this is correct.

> Is this the only Lockdep model that can be used for these 
> waiting events?

We've used this model because what jbd2 with transaction handles is that
essentially every existing journal handle is a reference to the running
transaction - this reference is modeled by 'read acquisition' - and
transaction commit and consequently places waiting for more journal space
has to wait for all outstanding handles - this wait is modeled by the
'write acquisition'.

But certainly there are different wait-wake schemes that could be modeled
differently with lockdep.

> For your reference, here are 2 examples that we are trying to annotate 
> with Lockdep and we would like to know if we are on the correct track.
> 
> In the first example it makes sense to use the JBD2 model, however we 
> are not sure how to apply the model in the second case. The comments 
> indicate our concerns.
> 
> ------------------------------
> Simple Case:
> 
> TA
> rwsem_acquire_read(lockdep_map);
> cond=false
> do_work()
> cond=true
> rwsem_release_read(lockdep_map);
> signal(w)
> 
> TB
> rwsem_acquire(lockdep_map);
> rswem_release(lockdep_map);
> wait_event(w, cond==true)
> 
> Advanced Case:
> 
> TA
> rwsem_acquire_read(lockdep_map)
> cond=false
> // exits while holding the lock
> 
> TB
> cond=true
> rwsem_release_read(lockdep_map) // We do not know that we hold the lock
> signal(w)
> 
> TC
> rwsem_acquire(lockdep_map);
> rswem_release(lockdep_map);
> wait_event(w, cond==true)

So this is difficult to track with lockdep because lockdep supports only
task-local locking so when "resource ownership" moves between tasks things
are difficult to track. How we usually do this (e.g. we did something
similar in fs/aio.c where filesystem freeze protection is acquired on IO
submission and we release it on IO completion from a different task /
context) is that we do:

TA
rwsem_acquire_read(lockdep_map)
cond=false
// push this as far as it is reasonably possible in TA to allow lockdep to
// track what needs to be done in TA while waiting for TB to do work
rwsem_release_read(lockdep_map)

TB
// Tell lockdep TB has inherited the resource, push this as early as
// reasonably possible to allow lockdep track most dependencies
rwsem_acquire_read(lockdep_map)
cond=true
signal(w)
rwsem_release_read(lockdep_map)

It is not perfect and some dependencies may be missed but it's better than
nothing.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Modeling wait events with Lockdep
  2022-07-01 11:59 ` Jan Kara
@ 2022-07-06  0:51   ` Ioannis Angelakopoulos
  0 siblings, 0 replies; 3+ messages in thread
From: Ioannis Angelakopoulos @ 2022-07-06  0:51 UTC (permalink / raw)
  To: Jan Kara; +Cc: mingo, jack, boris, josef, linux-kernel

On 7/1/22 4:59 AM, Jan Kara wrote:
> Hello!
> 
> On Thu 30-06-22 23:05:07, Ioannis Angelakopoulos wrote:
>> I would like to ask some questions regarding modeling waiting for events
>> (i.e the wait_event) in Linux using Lockdep.
>> I am trying to model these events in btrfs since there are deadlocks
>> detected involving waiting for events and Lockdep is not currently able
>> to address them (e.g.,
>> https://lore.kernel.org/linux-btrfs/cover.1655147296.git.josef@toxicpanda.com/).
>>
>> I am very new to Lockdep so I would like to know, what would be the
>> correct way of implementing these models using Lockdep?
>>
>> I noticed that JBD2 uses a read-write lockdep map. It takes the read
>> lockdep map when it creates a transaction handle and unlocks the read
>> lockdep map when it frees the handle. Also, every time the thread has to
>> wait for resources (e.g., transaction credits) and the handle is not
>> supposed to be alive, the thread locks and unlocks immediately the write
>> lockdep map before the waiting event (maybe I understood something wrong
>> here?).
> 
> No this is correct.
> 
>> Is this the only Lockdep model that can be used for these
>> waiting events?
> 
> We've used this model because what jbd2 with transaction handles is that
> essentially every existing journal handle is a reference to the running
> transaction - this reference is modeled by 'read acquisition' - and
> transaction commit and consequently places waiting for more journal space
> has to wait for all outstanding handles - this wait is modeled by the
> 'write acquisition'.
> 
> But certainly there are different wait-wake schemes that could be modeled
> differently with lockdep.
> 
>> For your reference, here are 2 examples that we are trying to annotate
>> with Lockdep and we would like to know if we are on the correct track.
>>
>> In the first example it makes sense to use the JBD2 model, however we
>> are not sure how to apply the model in the second case. The comments
>> indicate our concerns.
>>
>> ------------------------------
>> Simple Case:
>>
>> TA
>> rwsem_acquire_read(lockdep_map);
>> cond=false
>> do_work()
>> cond=true
>> rwsem_release_read(lockdep_map);
>> signal(w)
>>
>> TB
>> rwsem_acquire(lockdep_map);
>> rswem_release(lockdep_map);
>> wait_event(w, cond==true)
>>
>> Advanced Case:
>>
>> TA
>> rwsem_acquire_read(lockdep_map)
>> cond=false
>> // exits while holding the lock
>>
>> TB
>> cond=true
>> rwsem_release_read(lockdep_map) // We do not know that we hold the lock
>> signal(w)
>>
>> TC
>> rwsem_acquire(lockdep_map);
>> rswem_release(lockdep_map);
>> wait_event(w, cond==true)
> 
> So this is difficult to track with lockdep because lockdep supports only
> task-local locking so when "resource ownership" moves between tasks things
> are difficult to track. How we usually do this (e.g. we did something
> similar in fs/aio.c where filesystem freeze protection is acquired on IO
> submission and we release it on IO completion from a different task /
> context) is that we do:
> 
> TA
> rwsem_acquire_read(lockdep_map)
> cond=false
> // push this as far as it is reasonably possible in TA to allow lockdep to
> // track what needs to be done in TA while waiting for TB to do work
> rwsem_release_read(lockdep_map)
> 
> TB
> // Tell lockdep TB has inherited the resource, push this as early as
> // reasonably possible to allow lockdep track most dependencies
> rwsem_acquire_read(lockdep_map)
> cond=true
> signal(w)
> rwsem_release_read(lockdep_map)
> 
> It is not perfect and some dependencies may be missed but it's better than
> nothing.
> 
> 								Honza
Thank you so much for the clarification and your illustrative example!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-07-06  0:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-30 23:05 Modeling wait events with Lockdep Ioannis Angelakopoulos
2022-07-01 11:59 ` Jan Kara
2022-07-06  0:51   ` Ioannis Angelakopoulos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).