All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: driver/base/dd.c lockdep warning
       [not found] <Pine.LNX.4.44L0.0909011146030.3190-100000@iolanthe.rowland.org>
@ 2009-09-02 10:38 ` Peter Zijlstra
  2009-09-02 14:22   ` Alan Stern
  2009-09-02 18:53   ` Jan Blunck
  0 siblings, 2 replies; 3+ messages in thread
From: Peter Zijlstra @ 2009-09-02 10:38 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jan Blunck, Kay Sievers, gregkh, kasievers, USB list,
	Thomas Gleixner, linux-kernel, Ingo Molnar,
	Uwe Kleine-König

On Tue, 2009-09-01 at 11:50 -0400, Alan Stern wrote:
> On Tue, 1 Sep 2009, Jan Blunck wrote:
> 
> > On Tue, Sep 01, Alan Stern wrote:
> > 
> > > On Tue, 1 Sep 2009, Jan Blunck wrote:
> > > 
> > > > > This is semaphore->mutex conversion madness from tglx.  What he tried 
> > > > > to do just doesn't work with lockdep.
> > > > > 
> > > > 
> > > > If this is a parent->child relationship and the parent is always locked before
> > > > the child this works perfectly with lockdep. The inode->i_mutex is doing
> > > > it. How is the lock in your code different from that?
> > > 
> > > Maybe you're right and it's not different.  I'm not so sure.  What
> > > about parent-child-grandchild relationships?  What about situations
> > > where multiple siblings are locked concurrently after first acquiring
> > > the parent's lock to make it safe (not that I'm aware of any such
> > > things occurring in the kernel, but they might)?
> > 
> > You have to come up with a locking order on the siblings to make it deadlock
> > free. After that you teach the locking order to lockdep and everything should
> > be fine.
> 
> That's not true at all.  Provided you always lock the parent before 
> locking any of the children, the order in which you lock the children 
> doesn't matter; it cannot deadlock.
> 
> > If nobody is working on this I'll try to come up with a few patches tomorrow.
> 
> Okay.  Peter Zijlstra had some thoughts when this issue came up a week 
> or two ago, CC'ing him.
> 
> Do you have to specify at the time you lock a mutex whether it is a 
> parent or a child?  What happens if it is both?

OK, so the problem is that lockdep groups individual locks into classes
and does lock dependency computations on these classes instead of on
individual locks (this is what keeps the whole exercise feasible, this
also makes it more powerful in that we can detect a lock order inversion
before it actually happens).

When you nest two locks of the same class it can't say whether its the
same lock or two locks, nor can it determine if you do indeed observe a
proper locking order between two instances when it are indeed two
separate locks.

Classes are normally created per lock init site, that is, all locks
initialized from a particular spin_lock_init() site will belong to the
same class by default.

There's a number of lockdep annotations to help out.

 - *_lock_nested(&lock, subclass)

   It says: we know what we're doing, consider this lock in $subclass
   and lockdep will then consider this lock op part of a subclass.

   $subclass is limited to [0-7], and spin_lock(&lock) is equal to
   spin_lock_nested(&lock, 0). Also, any subclass is free to nest in any
   other, as long as its consistent.

   This is useful for limited nesting situations, eg. vfs parent/child
   inode relations.

   It is possible to annotate a real deadlock away using this, consider:

   void double_lock(spinlock_t *a, spinlock_t *b)
   {
	spin_lock(a);
	spin_lock_nested(b, SINGLE_DEPTH_NESTING);
   }

   double_lock(&lock1, &lock2);

   vs.   

   double_lock(&lock2, &lock1);

   This will _NOT_ warn, but will most certainly lead to a deadlock.

 - lockdep_set_class*(&lock, &lock_class_key, ...)

   This will manually assign a different lock class to a lock, and this
   needs to be done _after_ *_lock_init() but _before_ the first actual
   use of this lock.

   The struct lock_class_key _must_ be in static storage.

   An example is in the vfs, which uses this to separate the locks
   on an filesystem type basis, since some filesystems have different
   (and conflicting) locking orders.

 - spin_lock_nest_lock(&lock, &parent) 

   [ currently not available for mutexes for no other reason than
     that there is no user ]

   This will allow nesting of lock's class and validates parent is
   indeed taken.

   It will revert to counting lock instances and allows of up to 2048
   child locks of that particular class to nest. This weakens lockdep
   and lockstat in that the unlock() path doesn't know if it was indeed
   this particular lock, and lockstat in that it will only track the
   first lock instance.

   Again, this will allow annotating away read deadlocks, consider
   multiple child locks being taking without holding parent.

Furthermore, there is a limit of (48) locks that can be tracked at any
one time.

Now the particular issue at hand is that the device tree is a free form
tree with (afaiu) unlimited lock nesting depth -- if you were to plug in
an already daisy chained usb hub with actual devices on the 7th deep hub
device discovery will hold the device locks for the root hub, all 7
intermediate hubs and the child device, leading to a total of 9 locks.

And there is nothing fundamental -- other than the usb chain depth --
that limits this scenario, imagine the device to be an i2c bus with yet
more devices ;-)

[ There used to be the additional complexity that on suspend/resume
  we would hold _ALL_ device locks, which would exceed the max we can
  track for any one task, this however has long been fixed. ]


So the proposal I currently have to solve this is to allocate 48 lock
classes:

struct lock_class_key device_tree_depth[MAX_LOCK_DEPTH];

and when creating a new device node, set the lock class corresponding
the depth in the tree:

  mutex_lock_init(&device->lock);
  BUG_ON(device->depth >= MAX_LOCK_DEPTH); // surely we're not that deep
  lockdep_set_class(&device->lock, device_tree_depth + device->depth);
  ...
  mutex_lock(&device->lock); /* already have parent locked */
  device_attach(device, parent);

and take multiple child locks using:

  mutex_lock_nest_lock(&device->lock, &device->parent->lock);

Which, I think should work for most cases out there.

Alan had some funny corner cases, but I think he wasn't sure whether
those would indeed show up in reality.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: driver/base/dd.c lockdep warning
  2009-09-02 10:38 ` driver/base/dd.c lockdep warning Peter Zijlstra
@ 2009-09-02 14:22   ` Alan Stern
  2009-09-02 18:53   ` Jan Blunck
  1 sibling, 0 replies; 3+ messages in thread
From: Alan Stern @ 2009-09-02 14:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jan Blunck, Kay Sievers, gregkh, kasievers, USB list,
	Thomas Gleixner, linux-kernel, Ingo Molnar,
	Uwe Kleine-König

On Wed, 2 Sep 2009, Peter Zijlstra wrote:

> There's a number of lockdep annotations to help out.
...

Thanks for the detailed explanation; I appreciate it.

> Now the particular issue at hand is that the device tree is a free form
> tree with (afaiu) unlimited lock nesting depth -- if you were to plug in
> an already daisy chained usb hub with actual devices on the 7th deep hub
> device discovery will hold the device locks for the root hub, all 7
> intermediate hubs and the child device, leading to a total of 9 locks.

That's true in principle but not in practice because of the way the USB 
hub driver is designed.  Children aren't discovered and registered when 
the parent is probed; they are handled later.

Of course, this kind of scenario absolutely could occur with other 
device types.

> And there is nothing fundamental -- other than the usb chain depth --
> that limits this scenario, imagine the device to be an i2c bus with yet
> more devices ;-)
> 
> [ There used to be the additional complexity that on suspend/resume
>   we would hold _ALL_ device locks, which would exceed the max we can
>   track for any one task, this however has long been fixed. ]
> 
> 
> So the proposal I currently have to solve this is to allocate 48 lock
> classes:
> 
> struct lock_class_key device_tree_depth[MAX_LOCK_DEPTH];
> 
> and when creating a new device node, set the lock class corresponding
> the depth in the tree:
> 
>   mutex_lock_init(&device->lock);
>   BUG_ON(device->depth >= MAX_LOCK_DEPTH); // surely we're not that deep
>   lockdep_set_class(&device->lock, device_tree_depth + device->depth);
>   ...
>   mutex_lock(&device->lock); /* already have parent locked */
>   device_attach(device, parent);
> 
> and take multiple child locks using:
> 
>   mutex_lock_nest_lock(&device->lock, &device->parent->lock);
> 
> Which, I think should work for most cases out there.

I agree.  It would be rather surprising to find a chain of devices 
nested more than 48 deep.

> Alan had some funny corner cases, but I think he wasn't sure whether
> those would indeed show up in reality.

Yes; there's no point worrying about them now.  This sounds like a good 
approach.

Alan Stern


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: driver/base/dd.c lockdep warning
  2009-09-02 10:38 ` driver/base/dd.c lockdep warning Peter Zijlstra
  2009-09-02 14:22   ` Alan Stern
@ 2009-09-02 18:53   ` Jan Blunck
  1 sibling, 0 replies; 3+ messages in thread
From: Jan Blunck @ 2009-09-02 18:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alan Stern, Kay Sievers, gregkh, kasievers, USB list,
	Thomas Gleixner, linux-kernel, Ingo Molnar,
	Uwe Kleine-König

On Wed, Sep 02, Peter Zijlstra wrote:

> 
> So the proposal I currently have to solve this is to allocate 48 lock
> classes:
> 
> struct lock_class_key device_tree_depth[MAX_LOCK_DEPTH];
> 
> and when creating a new device node, set the lock class corresponding
> the depth in the tree:
> 
>   mutex_lock_init(&device->lock);
>   BUG_ON(device->depth >= MAX_LOCK_DEPTH); // surely we're not that deep
>   lockdep_set_class(&device->lock, device_tree_depth + device->depth);
>   ...
>   mutex_lock(&device->lock); /* already have parent locked */
>   device_attach(device, parent);
> 
> and take multiple child locks using:
> 
>   mutex_lock_nest_lock(&device->lock, &device->parent->lock);
> 
> Which, I think should work for most cases out there.
> 
> Alan had some funny corner cases, but I think he wasn't sure whether
> those would indeed show up in reality.

JFYI, I tried to get away with just parent, child and normal class but
the problem is that device_add() is called recursive via driver probing
code. So either we change that (and iterate down the device tree) or we need
to implement what you proposed.

Cheers,
Jan

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-09-02 18:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.44L0.0909011146030.3190-100000@iolanthe.rowland.org>
2009-09-02 10:38 ` driver/base/dd.c lockdep warning Peter Zijlstra
2009-09-02 14:22   ` Alan Stern
2009-09-02 18:53   ` Jan Blunck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.