All of lore.kernel.org
 help / color / mirror / Atom feed
* Lockdep false positive in sysfs
@ 2012-04-25 18:58 Alan Stern
  2012-04-25 21:59 ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Stern @ 2012-04-25 18:58 UTC (permalink / raw)
  To: Peter Zijlstra, Tejun Heo; +Cc: Kernel development list

Peter and Tejun:

Here's my problem, which affects several sysfs attribute methods in the 
USB subsystem:

Sysfs attribute A is attached to a USB device D.  When the user writes
to A, the corresponding store method unregisters D's children (it does
not unregister D, though).

Now, some of these children may also be USB devices (i.e., if D is a
hub), and therefore may have the same set of sysfs attributes.  As a
result, A's store method for D will end up removing the A attribute for
device E, where E is a child of D.

This causes lockdep to complain.  When A's method is called, sysfs
tells lockdep that it holds a readlock for the s_active "rwsem"
associated with the A attribute for D.  However the sysfs routine that
removes attributes tells lockdep that it is going to get a writelock
for the s_active associated with the A attribute for E, which is in the
same lockdep class since it belongs to the same attribute.

The resulting splat looks like this:

[ 1004.679564] =============================================
[ 1004.680053] [ INFO: possible recursive locking detected ]
[ 1004.680053] 3.4.0-rc4+ #17 Tainted: G           O
[ 1004.680053] ---------------------------------------------
[ 1004.680053] bash/1256 is trying to acquire lock:
[ 1004.680053]  (s_active#73){++++.+}, at: [<c10c9c2b>] sysfs_hash_and_remove+0x60/0x71
[ 1004.680053] 
[ 1004.680053] but task is already holding lock:
[ 1004.680053]  (s_active#73){++++.+}, at: [<c10c9fce>] sysfs_write_file+0xa7/0xea
[ 1004.680053] 
[ 1004.680053] other info that might help us debug this:
[ 1004.680053]  Possible unsafe locking scenario:
[ 1004.680053] 
[ 1004.680053]        CPU0
[ 1004.680053]        ----
[ 1004.680053]   lock(s_active#73);
[ 1004.680053]   lock(s_active#73);
[ 1004.680053] 
[ 1004.680053]  *** DEADLOCK ***
[ 1004.680053] 
[ 1004.680053]  May be due to missing lock nesting notation
[ 1004.680053] 
[ 1004.680053] 4 locks held by bash/1256:
[ 1004.680053]  #0:  (&buffer->mutex){+.+.+.}, at: [<c10c9f4c>] sysfs_write_file+0x25/0xea
[ 1004.680053]  #1:  (s_active#73){++++.+}, at: [<c10c9fce>] sysfs_write_file+0xa7/0xea
[ 1004.680053]  #2:  (&__lockdep_no_validate__){......}, at: [<f002c376>] usb_deauthorize_device+0x16/0xb2 [usbcore]
[ 1004.680053]  #3:  (&__lockdep_no_validate__){......}, at: [<c1185767>] device_release_driver+0x13/0x25
[ 1004.680053] 
[ 1004.680053] stack backtrace:
[ 1004.680053] Pid: 1256, comm: bash Tainted: G           O 3.4.0-rc4+ #17
[ 1004.680053] Call Trace:
[ 1004.680053]  [<c101ea89>] ? console_unlock+0x1ad/0x1d3
[ 1004.680053]  [<c104f712>] __lock_acquire+0x82f/0xb8a
[ 1004.680053]  [<c104df6d>] ? mark_lock+0x26/0x21f
[ 1004.680053]  [<c104e47a>] ? debug_check_no_locks_freed+0x112/0x11c
[ 1004.680053]  [<c104e31c>] ? trace_hardirqs_on_caller+0x13f/0x17e
[ 1004.680053]  [<c104e366>] ? trace_hardirqs_on+0xb/0xd
[ 1004.680053]  [<c104fde5>] lock_acquire+0x5e/0x75
[ 1004.680053]  [<c10c9c2b>] ? sysfs_hash_and_remove+0x60/0x71
[ 1004.680053]  [<c10caf8c>] sysfs_addrm_finish+0x86/0xd2
[ 1004.680053]  [<c10c9c2b>] ? sysfs_hash_and_remove+0x60/0x71
[ 1004.680053]  [<c10c0000>] ? m_next+0x4c/0x56
[ 1004.680053]  [<c10c0000>] ? m_next+0x4c/0x56
[ 1004.680053]  [<c10c9c2b>] sysfs_hash_and_remove+0x60/0x71
[ 1004.680053]  [<c10cc329>] sysfs_remove_group+0x56/0x77
[ 1004.680053]  [<c118331e>] device_remove_groups+0x17/0x25
[ 1004.680053]  [<c11834b5>] device_remove_attrs+0x1c/0x47
[ 1004.680053]  [<c11835c6>] device_del+0xe6/0x135
[ 1004.680053]  [<f002b658>] usb_disconnect+0x9a/0xd4 [usbcore]
[ 1004.680053]  [<f002b6d6>] hub_quiesce+0x44/0x83 [usbcore]
[ 1004.680053]  [<f002b86c>] hub_disconnect+0x6b/0xdb [usbcore]
[ 1004.680053]  [<f0032f1d>] usb_unbind_interface+0x42/0xf9 [usbcore]
[ 1004.680053]  [<c1185719>] __device_release_driver+0x66/0xa1
[ 1004.680053]  [<c118576e>] device_release_driver+0x1a/0x25
[ 1004.680053]  [<c1185398>] bus_remove_device+0xa3/0xb0
[ 1004.680053]  [<c11835cd>] device_del+0xed/0x135
[ 1004.680053]  [<f00316fb>] usb_disable_device+0x79/0x19b [usbcore]
[ 1004.680053]  [<f0031e1d>] usb_set_configuration+0x18e/0x513 [usbcore]
[ 1004.680053]  [<f002c397>] usb_deauthorize_device+0x37/0xb2 [usbcore]
[ 1004.680053]  [<f0035204>] usb_dev_authorized_store+0x31/0x43 [usbcore]
[ 1004.680053]  [<f00351d3>] ? set_persist+0x67/0x67 [usbcore]
[ 1004.680053]  [<c1182f83>] dev_attr_store+0x1b/0x23
[ 1004.680053]  [<c10c9fe5>] sysfs_write_file+0xbe/0xea
[ 1004.680053]  [<c10c9f27>] ? sysfs_open_file+0x1c6/0x1c6
[ 1004.680053]  [<c108ebc0>] vfs_write+0x74/0xa0
[ 1004.680053]  [<c108ed21>] sys_write+0x3b/0x5d
[ 1004.680053]  [<c1221210>] sysenter_do_call+0x12/0x36

Normally one solves this sort of problem by annotating one of the locks
as a nested call.  But there doesn't appear to be any reasonable way of
doing this for sysfs attributes.

I could work around the problem by having the method do everything in a 
workqueue, but I would prefer not to -- the actions taken by these 
attributes really ought to be synchronous.

Do you guys have any suggestions for better solutions?

Alan Stern


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-25 18:58 Lockdep false positive in sysfs Alan Stern
@ 2012-04-25 21:59 ` Tejun Heo
  2012-04-26  8:16   ` Eric W. Biederman
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2012-04-25 21:59 UTC (permalink / raw)
  To: Alan Stern; +Cc: Peter Zijlstra, Kernel development list

Hello, Alan.

On Wed, Apr 25, 2012 at 02:58:28PM -0400, Alan Stern wrote:
> Peter and Tejun:
> 
> Here's my problem, which affects several sysfs attribute methods in the 
> USB subsystem:
> 
> Sysfs attribute A is attached to a USB device D.  When the user writes
> to A, the corresponding store method unregisters D's children (it does
> not unregister D, though).
> 
> Now, some of these children may also be USB devices (i.e., if D is a
> hub), and therefore may have the same set of sysfs attributes.  As a
> result, A's store method for D will end up removing the A attribute for
> device E, where E is a child of D.
> 
> This causes lockdep to complain.  When A's method is called, sysfs
> tells lockdep that it holds a readlock for the s_active "rwsem"
> associated with the A attribute for D.  However the sysfs routine that
> removes attributes tells lockdep that it is going to get a writelock
> for the s_active associated with the A attribute for E, which is in the
> same lockdep class since it belongs to the same attribute.

Hmmm.... This happens because, by default, sysfs_dirents for the same
attr share the same lockdep key.  This happens from
sysfs_dirent_init_lockdep().  Hmm.... we can,

* Somehow assign different keys to sysfs_dirents for the specific
  attr.  Use array of attrs indexed by bus depth?

* Add a flag / whatever to attr indicating that the files of the
  attribute may be removed recursively (lockdep-wise) and update
  either read or write path to use subclass.

Any better ideas?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-25 21:59 ` Tejun Heo
@ 2012-04-26  8:16   ` Eric W. Biederman
  2012-04-26 18:14     ` Alan Stern
  0 siblings, 1 reply; 24+ messages in thread
From: Eric W. Biederman @ 2012-04-26  8:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Alan Stern, Peter Zijlstra, Kernel development list

Tejun Heo <tj@kernel.org> writes:

> Hello, Alan.
>
> On Wed, Apr 25, 2012 at 02:58:28PM -0400, Alan Stern wrote:
>> Peter and Tejun:
>> 
>> Here's my problem, which affects several sysfs attribute methods in the 
>> USB subsystem:
>> 
>> Sysfs attribute A is attached to a USB device D.  When the user writes
>> to A, the corresponding store method unregisters D's children (it does
>> not unregister D, though).
>> 
>> Now, some of these children may also be USB devices (i.e., if D is a
>> hub), and therefore may have the same set of sysfs attributes.  As a
>> result, A's store method for D will end up removing the A attribute for
>> device E, where E is a child of D.

Before we get too far getting lockdep to shut up let me ask.  Is this
really safe?  The code looks like it tries to be safe but in the case
of recursion I don't buy it.  You should be having this same problem
with other locks and I don't see the use of mutex_lock_nested anywhere.

I looked into this a couple years ago with regards to pci devices, and
my conclusion at the time was that the device core nor the pci
code was multi-thread hotplug safe, and that the easiest fix was
to use a single threaded workqueue.

What I see right now with the device_lock and usb looks better but
I'm not at all certain I believe that if you attempt to remove different
levels of the hierarchy at the same time from different processes
that this code is really safe.  Could you explain to me how the
locking works in the usb layer to prevent problems with different
processes removing different layers of the same hierarchy?

Especially having it work without having having to do something
interesting with lockdep.

>> This causes lockdep to complain.  When A's method is called, sysfs
>> tells lockdep that it holds a readlock for the s_active "rwsem"
>> associated with the A attribute for D.  However the sysfs routine that
>> removes attributes tells lockdep that it is going to get a writelock
>> for the s_active associated with the A attribute for E, which is in the
>> same lockdep class since it belongs to the same attribute.
>
> Hmmm.... This happens because, by default, sysfs_dirents for the same
> attr share the same lockdep key.  This happens from
> sysfs_dirent_init_lockdep().  Hmm.... we can,
>
> * Somehow assign different keys to sysfs_dirents for the specific
>   attr.  Use array of attrs indexed by bus depth?

Possible with sysfs_attr_init but pretty ugly.  Especially since it
sounds like this is a situation that does not presuppose a maximum
depth.  I do remember that the lockdep keys must be statically allocated
which makes this a challenge.

Or it might be a remote possibility to adapt things a little bit.
What makes this different from most of the recursing lock strategies
that I know of is that we are actually in fact taking the same
"lock" in the same place, and performing the same action.

> * Add a flag / whatever to attr indicating that the files of the
>   attribute may be removed recursively (lockdep-wise) and update
>   either read or write path to use subclass.
>
> Any better ideas?

I'm not going to worry about it too much until someone explains to me
how usb hotplug is really safe and how of all the locks taken to
make it happen somehow only the sysfs lock triggers a lockdep warning
because we are recursing.  We should be seeing the same issue with
the device lock if the device lock actually provides sufficient
protection to be useful.

Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-26  8:16   ` Eric W. Biederman
@ 2012-04-26 18:14     ` Alan Stern
  2012-04-26 22:17       ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Stern @ 2012-04-26 18:14 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Tejun Heo, Peter Zijlstra, Kernel development list

On Thu, 26 Apr 2012, Eric W. Biederman wrote:

> Tejun Heo <tj@kernel.org> writes:
> 
> > Hello, Alan.
> >
> > On Wed, Apr 25, 2012 at 02:58:28PM -0400, Alan Stern wrote:
> >> Peter and Tejun:
> >> 
> >> Here's my problem, which affects several sysfs attribute methods in the 
> >> USB subsystem:
> >> 
> >> Sysfs attribute A is attached to a USB device D.  When the user writes
> >> to A, the corresponding store method unregisters D's children (it does
> >> not unregister D, though).
> >> 
> >> Now, some of these children may also be USB devices (i.e., if D is a
> >> hub), and therefore may have the same set of sysfs attributes.  As a
> >> result, A's store method for D will end up removing the A attribute for
> >> device E, where E is a child of D.
> 
> Before we get too far getting lockdep to shut up let me ask.  Is this
> really safe?  The code looks like it tries to be safe but in the case
> of recursion I don't buy it.  You should be having this same problem
> with other locks and I don't see the use of mutex_lock_nested anywhere.

You're right, of course.  In the USB subsystem, configuration changes
(which are what I was talking about above) and device unregistration
are protected by the mutex of the underlying struct device.  Thus,
you're not allowed to change a device's configuration (which in the
case of a hub would end up unregistering all the children) unless you
already hold the device's lock, and you're not allowed to unregister a
USB device unless you already hold its parent's lock.

The reason this doesn't provoke any objections from lockdep is very 
simple: The mutex in struct device is explicitly set to the 
"novalidate" class in device_initialize().  Therefore lockdep 
essentially ignores it.

We did this because after a fair amount of back-and-forth discussion,
Peter and I were unable to come up with a good way of treating the
tree-structured locks in the device hierarchy.  (Actually it's a
forest, not a tree, which makes the situation even worse.)

I don't know of any other locks using the "novalidate" class, but I
haven't looked to see.  In any case, the locks used by sysfs do _not_
belong to this class, which is a good thing because we really do want
to catch cases where an attribute's method tries to remove that same
attribute -- so-called "suicidal" attributes.

But that's not what's happening here.  In this situation, attribute A
for device D wants to remove attribute A for device E, not for D.  
This is safe, not suicidal, but there's currently no way to tell 
lockdep.

Does this answer your question?  I'm not quite certain what you were 
asking.

> I looked into this a couple years ago with regards to pci devices, and
> my conclusion at the time was that the device core nor the pci
> code was multi-thread hotplug safe, and that the easiest fix was
> to use a single threaded workqueue.

As far as I know, the device core and the USB core are both
multi-thread hotplug safe.  I'm not as familiar with the PCI core.

> What I see right now with the device_lock and usb looks better but
> I'm not at all certain I believe that if you attempt to remove different
> levels of the hierarchy at the same time from different processes
> that this code is really safe.  Could you explain to me how the
> locking works in the usb layer to prevent problems with different
> processes removing different layers of the same hierarchy?
> 
> Especially having it work without having having to do something
> interesting with lockdep.

The locking rules are explained above.

Let's consider an example.  Suppose we have A - B - C, where A is the 
parent of B and B is the parent of C.  Suppose thread 1 tries to remove 
B and concurrently, thread 2 tries to remove C.

The resulting course of events depends on which thread acquires B's
mutex first.  Here's one possibility:

	Thread 1			Thread 2
	--------			--------
	usb_lock_device(A)		usb_lock_device(B)
	usb_disconnect(B)		usb_disconnect(C)
	  usb_lock_device(B) (blocks)	  usb_lock_device(C)
					  ... various housekeeping ...
					  usb_unlock_device(C)
					  device_del(&C->dev)
					  put_device(&C->dev)
					usb_unlock_device(B)
	  (acquires B's mutex)
	  ... various housekeeping, including:
	    remove any children of B (but C is
	    not a child of B any more)
	  ... end of housekeeping
	  usb_unlock_device(B)
	  device_del(&B->dev)
	  put_device(&B->dev)
	usb_unlock_device(A)

Here's the other:

	Thread 1			Thread 2
	--------			--------
	usb_lock_device(A)
	usb_disconnect(B)
	  usb_lock_device(B)
	  ... various housekeeping, including:
	    usb_set_device_state(B, USB_STATE_NOTATTACHED)
	    usb_disconnect(C) (recursive call)
	      usb_lock_device(C)
	      ... various housekeeping ...
	      usb_unlock_device(C)	usb_lock_device(B) (blocks)
	      device_del(&C->dev)
	      put_device(&C->dev)
	  ... end of housekeeping
	  usb_unlock_device(B)
					(acquires B's mutex)
	  device_del(&B->dev)		Sees that B's state is now
					USB_STATE_NOTATTACHED, so does
					not try to do anything about C
					usb_unlock_device(B)
	  put_device(&B->dev)
	usb_unlock_device(A)	

Note that device_del() internally grabs the device's mutex, so thread
1's put_device(&B->dev) doesn't occur until after thread 2 unlocks B.

> >> This causes lockdep to complain.  When A's method is called, sysfs
> >> tells lockdep that it holds a readlock for the s_active "rwsem"
> >> associated with the A attribute for D.  However the sysfs routine that
> >> removes attributes tells lockdep that it is going to get a writelock
> >> for the s_active associated with the A attribute for E, which is in the
> >> same lockdep class since it belongs to the same attribute.
> >
> > Hmmm.... This happens because, by default, sysfs_dirents for the same
> > attr share the same lockdep key.  This happens from
> > sysfs_dirent_init_lockdep().  Hmm.... we can,
> >
> > * Somehow assign different keys to sysfs_dirents for the specific
> >   attr.  Use array of attrs indexed by bus depth?
> 
> Possible with sysfs_attr_init but pretty ugly.  Especially since it
> sounds like this is a situation that does not presuppose a maximum
> depth.  I do remember that the lockdep keys must be statically allocated
> which makes this a challenge.

I agree; this doesn't seem like a good approach.

> Or it might be a remote possibility to adapt things a little bit.
> What makes this different from most of the recursing lock strategies
> that I know of is that we are actually in fact taking the same
> "lock" in the same place, and performing the same action.

Well, not exactly the same place.  The readlock is acquired in 
sysfs_get_active() and the writelock is acquired in sysfs_deactivate().

> > * Add a flag / whatever to attr indicating that the files of the
> >   attribute may be removed recursively (lockdep-wise) and update
> >   either read or write path to use subclass.
> >
> > Any better ideas?
> 
> I'm not going to worry about it too much until someone explains to me
> how usb hotplug is really safe and how of all the locks taken to
> make it happen somehow only the sysfs lock triggers a lockdep warning
> because we are recursing.  We should be seeing the same issue with
> the device lock if the device lock actually provides sufficient
> protection to be useful.

Another idea is to have A's method temporarily drop the sysfs readlock.  
Of course that would put the onus on the USB core of guaranteeing that
A cannot be removed while this happens, but we can handle that.

Alan Stern


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-26 18:14     ` Alan Stern
@ 2012-04-26 22:17       ` Tejun Heo
  2012-04-27 15:57         ` Alan Stern
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2012-04-26 22:17 UTC (permalink / raw)
  To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Thu, Apr 26, 2012 at 02:14:30PM -0400, Alan Stern wrote:
> > > Hmmm.... This happens because, by default, sysfs_dirents for the same
> > > attr share the same lockdep key.  This happens from
> > > sysfs_dirent_init_lockdep().  Hmm.... we can,
> > >
> > > * Somehow assign different keys to sysfs_dirents for the specific
> > >   attr.  Use array of attrs indexed by bus depth?
> > 
> > Possible with sysfs_attr_init but pretty ugly.  Especially since it
> > sounds like this is a situation that does not presuppose a maximum
> > depth.  I do remember that the lockdep keys must be statically allocated
> > which makes this a challenge.

The depth is limited by USB spec.

> I agree; this doesn't seem like a good approach.

It sure isn't pretty but probably best matches the situation in the
sense that lockdep would actually be able to know about the nesting
going on.

> Another idea is to have A's method temporarily drop the sysfs readlock.  
> Of course that would put the onus on the USB core of guaranteeing that
> A cannot be removed while this happens, but we can handle that.

Yeah, that's an easier way out.  Please make it a proper sysfs API
call tho so that people working on sysfs later can know of the special
case.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-26 22:17       ` Tejun Heo
@ 2012-04-27 15:57         ` Alan Stern
  2012-04-27 16:09           ` Tejun Heo
  2012-04-27 16:27           ` Eric W. Biederman
  0 siblings, 2 replies; 24+ messages in thread
From: Alan Stern @ 2012-04-27 15:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Thu, 26 Apr 2012, Tejun Heo wrote:

> On Thu, Apr 26, 2012 at 02:14:30PM -0400, Alan Stern wrote:
> > > > Hmmm.... This happens because, by default, sysfs_dirents for the same
> > > > attr share the same lockdep key.  This happens from
> > > > sysfs_dirent_init_lockdep().  Hmm.... we can,
> > > >
> > > > * Somehow assign different keys to sysfs_dirents for the specific
> > > >   attr.  Use array of attrs indexed by bus depth?
> > > 
> > > Possible with sysfs_attr_init but pretty ugly.  Especially since it
> > > sounds like this is a situation that does not presuppose a maximum
> > > depth.  I do remember that the lockdep keys must be statically allocated
> > > which makes this a challenge.
> 
> The depth is limited by USB spec.
> 
> > I agree; this doesn't seem like a good approach.
> 
> It sure isn't pretty but probably best matches the situation in the
> sense that lockdep would actually be able to know about the nesting
> going on.

By the way, do you know why attribute structures allow for dynamic keys 
as well as static keys?  I see dynamic keys are used by attribute 
containers, but I don't understand why.

> > Another idea is to have A's method temporarily drop the sysfs readlock.  
> > Of course that would put the onus on the USB core of guaranteeing that
> > A cannot be removed while this happens, but we can handle that.
> 
> Yeah, that's an easier way out.  Please make it a proper sysfs API
> call tho so that people working on sysfs later can know of the special
> case.

I will.

Would it be better to release just the lockdep annotation while
continuing to hold the actual lock, or to really drop the lock?

Alan Stern


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-27 15:57         ` Alan Stern
@ 2012-04-27 16:09           ` Tejun Heo
  2012-05-03 21:30             ` Alan Stern
  2012-04-27 16:27           ` Eric W. Biederman
  1 sibling, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2012-04-27 16:09 UTC (permalink / raw)
  To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

Hello,

On Fri, Apr 27, 2012 at 11:57:02AM -0400, Alan Stern wrote:
> By the way, do you know why attribute structures allow for dynamic keys 
> as well as static keys?  I see dynamic keys are used by attribute 
> containers, but I don't understand why.

I have no idea.  Eric?

> > > Another idea is to have A's method temporarily drop the sysfs readlock.  
> > > Of course that would put the onus on the USB core of guaranteeing that
> > > A cannot be removed while this happens, but we can handle that.
> > 
> > Yeah, that's an easier way out.  Please make it a proper sysfs API
> > call tho so that people working on sysfs later can know of the special
> > case.
> 
> I will.
> 
> Would it be better to release just the lockdep annotation while
> continuing to hold the actual lock, or to really drop the lock?

Just the lockdep annotation, I think.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-27 15:57         ` Alan Stern
  2012-04-27 16:09           ` Tejun Heo
@ 2012-04-27 16:27           ` Eric W. Biederman
  2012-04-27 18:27             ` Alan Stern
  1 sibling, 1 reply; 24+ messages in thread
From: Eric W. Biederman @ 2012-04-27 16:27 UTC (permalink / raw)
  To: Alan Stern; +Cc: Tejun Heo, Peter Zijlstra, Kernel development list

Alan Stern <stern@rowland.harvard.edu> writes:

> On Thu, 26 Apr 2012, Tejun Heo wrote:
>
>> On Thu, Apr 26, 2012 at 02:14:30PM -0400, Alan Stern wrote:
>> > > > Hmmm.... This happens because, by default, sysfs_dirents for the same
>> > > > attr share the same lockdep key.  This happens from
>> > > > sysfs_dirent_init_lockdep().  Hmm.... we can,
>> > > >
>> > > > * Somehow assign different keys to sysfs_dirents for the specific
>> > > >   attr.  Use array of attrs indexed by bus depth?
>> > > 
>> > > Possible with sysfs_attr_init but pretty ugly.  Especially since it
>> > > sounds like this is a situation that does not presuppose a maximum
>> > > depth.  I do remember that the lockdep keys must be statically allocated
>> > > which makes this a challenge.
>> 
>> The depth is limited by USB spec.
>> 
>> > I agree; this doesn't seem like a good approach.
>> 
>> It sure isn't pretty but probably best matches the situation in the
>> sense that lockdep would actually be able to know about the nesting
>> going on.
>
> By the way, do you know why attribute structures allow for dynamic keys 
> as well as static keys?  I see dynamic keys are used by attribute 
> containers, but I don't understand why.

Some attributes are allocated dynamically, but they need static keys.
It is a lockdep requirement.  So for dynamically allocated attributes
we have to smack them so they have static keys.

Almost all sysfs attributes are declared statically so the optimization
of taking the key from the attribute structure saves modifying a lot of
code and makes the code that exists easy to get correct.

Dynamically allocated attributes break in a pretty obvious way when you
try to use them with lockdep enabled so I think we have a reasonable
solution.

>> > Another idea is to have A's method temporarily drop the sysfs readlock.  
>> > Of course that would put the onus on the USB core of guaranteeing that
>> > A cannot be removed while this happens, but we can handle that.
>> 
>> Yeah, that's an easier way out.  Please make it a proper sysfs API
>> call tho so that people working on sysfs later can know of the special
>> case.
>
> I will.
>
> Would it be better to release just the lockdep annotation while
> continuing to hold the actual lock, or to really drop the lock?

We can't really drop the ``lock''.  That would imply not waiting for
all of the methods using a sysfs attributes not to finish before we
remove a sysfs attribute.  Kaboom!

Probably what would be better would figure out how to sneak in something
that for this file we tell lockdep to ignore it like you do the device
locks.  As you do for the device layer locking.

I don't like the choices available at this junction.  I have seen some
nasty ABBA deadlocks with sysfs, and anything that makes those easier to
see.

One thing that looks promising after reading lwn yesterday is what Al
and Oleg are doing with fput and a per task work queue that runs before
we return to user space.  If you could use that we could have our
cake and eat it too.   You could schedule the work in a work queue but
you could also be guaranteed that there are not any locking problems.

Do you think you could investigate that possibility?

Eric


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-27 16:27           ` Eric W. Biederman
@ 2012-04-27 18:27             ` Alan Stern
  2012-04-27 20:17               ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Stern @ 2012-04-27 18:27 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Tejun Heo, Peter Zijlstra, Kernel development list

On Fri, 27 Apr 2012, Eric W. Biederman wrote:

> >> > Another idea is to have A's method temporarily drop the sysfs readlock.  
> >> > Of course that would put the onus on the USB core of guaranteeing that
> >> > A cannot be removed while this happens, but we can handle that.
> >> 
> >> Yeah, that's an easier way out.  Please make it a proper sysfs API
> >> call tho so that people working on sysfs later can know of the special
> >> case.
> >
> > I will.
> >
> > Would it be better to release just the lockdep annotation while
> > continuing to hold the actual lock, or to really drop the lock?
> 
> We can't really drop the ``lock''.  That would imply not waiting for
> all of the methods using a sysfs attributes not to finish before we
> remove a sysfs attribute.  Kaboom!

That's why I wrote above "... would put the onus on the USB core 
of guaranteeing that A cannot be removed while this happens".  But 
regardless, we can keep the lock while dropping the annotation.

> Probably what would be better would figure out how to sneak in something
> that for this file we tell lockdep to ignore it like you do the device
> locks.  As you do for the device layer locking.

Dropping the annotation has pretty much that effect, right?  I don't 
want lockdep to ignore the locks entirely, because we still want to 
catch attributes that unregister themselves (as opposed to 
unregistering their counterparts in child devices).

> I don't like the choices available at this junction.  I have seen some
> nasty ABBA deadlocks with sysfs, and anything that makes those easier to
> see.
> 
> One thing that looks promising after reading lwn yesterday is what Al
> and Oleg are doing with fput and a per task work queue that runs before
> we return to user space.  If you could use that we could have our
> cake and eat it too.   You could schedule the work in a work queue but
> you could also be guaranteed that there are not any locking problems.
> 
> Do you think you could investigate that possibility?

Yes, I think that would work.  It would require a little more code and
some errors would not get reported correctly (those that occur during
the callback).

(Incidentally, the sysfs_workqueue used in sysfs_schedule_callback()  
could also be replaced by the per-task work queue.)

Which do you prefer: temporarily dropping the lockdep annotation, or
deferring the work to the per-task work queue?  Note that these "flush
before returning to userspace" things aren't in the kernel yet.

Alan Stern


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-27 18:27             ` Alan Stern
@ 2012-04-27 20:17               ` Tejun Heo
  2012-04-27 21:09                 ` Eric W. Biederman
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2012-04-27 20:17 UTC (permalink / raw)
  To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Fri, Apr 27, 2012 at 02:27:26PM -0400, Alan Stern wrote:
> Which do you prefer: temporarily dropping the lockdep annotation, or
> deferring the work to the per-task work queue?  Note that these "flush
> before returning to userspace" things aren't in the kernel yet.

It's about lockdep.  Let's not spill out to other areas unnecessarily.
I'd much prefer just working around lockdep annotation.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-27 20:17               ` Tejun Heo
@ 2012-04-27 21:09                 ` Eric W. Biederman
  2012-04-27 21:16                   ` Tejun Heo
  2012-04-29  2:00                   ` Alan Stern
  0 siblings, 2 replies; 24+ messages in thread
From: Eric W. Biederman @ 2012-04-27 21:09 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Alan Stern, Peter Zijlstra, Kernel development list

Tejun Heo <tj@kernel.org> writes:

> On Fri, Apr 27, 2012 at 02:27:26PM -0400, Alan Stern wrote:
>> Which do you prefer: temporarily dropping the lockdep annotation, or
>> deferring the work to the per-task work queue?  Note that these "flush
>> before returning to userspace" things aren't in the kernel yet.
>
> It's about lockdep.  Let's not spill out to other areas unnecessarily.
> I'd much prefer just working around lockdep annotation.

It is also about a locking scheme that makes analysis hard.

Doing something so we don't have false lockdep warnings is good
so people can concentrate on things that are clearly problems.

However it would be nice if we could sort out the locking so that it
isn't so tricky that neither lockdep nor sparse can figure it out.

I have the sneaking suspicion that idioms that tangle up our automatic
tools are also idioms that are likely to result in maintenance problems
at some point.

Another possibility to to look at the situation and realize that pci has
a maxium depth of 256 (bus numbers). And that usb also has a maxium
depth of I believe 256 ( If I read it right usb hubs are transparent to
usb enumeration so the maximum depth is the maximum number of usb ids
and I think the usb id is a 8 bit number).  I don't think anything else
even nests so deeply.  So it may be reasonable to declare an array of
256 or perhaps 1024 lockdep keys and limit the device tree when lockdep
is enabled to 1024 layers deep.

At which point we are at a point where lockdep can actually analyze the
behavior.

I don't mind the attitude we are clever careful programmers we can
handle the complexity and we can get away without the tool help us, but
I would much rather see the attitude that we are clever careful
programmers and we can figure out how to make the tool help us instead
of just ignoring it.

Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-27 21:09                 ` Eric W. Biederman
@ 2012-04-27 21:16                   ` Tejun Heo
  2012-04-29  2:00                   ` Alan Stern
  1 sibling, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2012-04-27 21:16 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Alan Stern, Peter Zijlstra, Kernel development list

On Fri, Apr 27, 2012 at 02:09:22PM -0700, Eric W. Biederman wrote:
> I don't mind the attitude we are clever careful programmers we can
> handle the complexity and we can get away without the tool help us, but
> I would much rather see the attitude that we are clever careful
> programmers and we can figure out how to make the tool help us instead
> of just ignoring it.

I'm okay with using static array of keys so that each level maps to
separate key or just calling it a special case and ignoring it, but I
think it's quite silly to use an async mechanism just to avoid lockdep
warning.  Things like that tend to make things obscure as people
generally don't expect lockdep annotation to dictate overall
behaviors.  It is an annotation problem.  Let's keep it that way.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-27 21:09                 ` Eric W. Biederman
  2012-04-27 21:16                   ` Tejun Heo
@ 2012-04-29  2:00                   ` Alan Stern
  2012-04-29  2:35                     ` Eric W. Biederman
  1 sibling, 1 reply; 24+ messages in thread
From: Alan Stern @ 2012-04-29  2:00 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Tejun Heo, Peter Zijlstra, Kernel development list

On Fri, 27 Apr 2012, Eric W. Biederman wrote:

> However it would be nice if we could sort out the locking so that it
> isn't so tricky that neither lockdep nor sparse can figure it out.
> 
> I have the sneaking suspicion that idioms that tangle up our automatic
> tools are also idioms that are likely to result in maintenance problems
> at some point.

That may well be true, but it won't be easy to avoid them.  At the
least, it would require a careful analysis of the device tree usage.

> Another possibility to to look at the situation and realize that pci has
> a maxium depth of 256 (bus numbers). And that usb also has a maxium
> depth of I believe 256 ( If I read it right usb hubs are transparent to
> usb enumeration so the maximum depth is the maximum number of usb ids
> and I think the usb id is a 8 bit number).

USB has a maximum depth of 7 or so.  It's limited by the number of hubs 
allowed on the path between the host and a device.

>  I don't think anything else
> even nests so deeply.  So it may be reasonable to declare an array of
> 256 or perhaps 1024 lockdep keys and limit the device tree when lockdep
> is enabled to 1024 layers deep.
> 
> At which point we are at a point where lockdep can actually analyze the
> behavior.

Unfortunately, we are not.  As I mentioned earlier, the device "tree" 
is really a forest.  Locks are sometimes acquired in orders that are 
not strictly downward.

> I don't mind the attitude we are clever careful programmers we can
> handle the complexity and we can get away without the tool help us, but
> I would much rather see the attitude that we are clever careful
> programmers and we can figure out how to make the tool help us instead
> of just ignoring it.

I'm certainly open to suggestions as to how to improve the situation, 
but the simple-minded "keep track of the depth in the tree" approach 
doesn't work.

Alan Stern


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-29  2:00                   ` Alan Stern
@ 2012-04-29  2:35                     ` Eric W. Biederman
  0 siblings, 0 replies; 24+ messages in thread
From: Eric W. Biederman @ 2012-04-29  2:35 UTC (permalink / raw)
  To: Alan Stern; +Cc: Tejun Heo, Peter Zijlstra, Kernel development list

Alan Stern <stern@rowland.harvard.edu> writes:

> On Fri, 27 Apr 2012, Eric W. Biederman wrote:
>
>> However it would be nice if we could sort out the locking so that it
>> isn't so tricky that neither lockdep nor sparse can figure it out.
>> 
>> I have the sneaking suspicion that idioms that tangle up our automatic
>> tools are also idioms that are likely to result in maintenance problems
>> at some point.
>
> That may well be true, but it won't be easy to avoid them.  At the
> least, it would require a careful analysis of the device tree usage.
>
>> Another possibility to to look at the situation and realize that pci has
>> a maxium depth of 256 (bus numbers). And that usb also has a maxium
>> depth of I believe 256 ( If I read it right usb hubs are transparent to
>> usb enumeration so the maximum depth is the maximum number of usb ids
>> and I think the usb id is a 8 bit number).
>
> USB has a maximum depth of 7 or so.  It's limited by the number of hubs 
> allowed on the path between the host and a device.
>
>>  I don't think anything else
>> even nests so deeply.  So it may be reasonable to declare an array of
>> 256 or perhaps 1024 lockdep keys and limit the device tree when lockdep
>> is enabled to 1024 layers deep.
>> 
>> At which point we are at a point where lockdep can actually analyze the
>> behavior.
>
> Unfortunately, we are not.  As I mentioned earlier, the device "tree" 
> is really a forest.  Locks are sometimes acquired in orders that are 
> not strictly downward.

Then it sounds like for the device tree in general this is a limitation.
For sysfs with your problem usb attribute this looks like a real
solution.

>> I don't mind the attitude we are clever careful programmers we can
>> handle the complexity and we can get away without the tool help us, but
>> I would much rather see the attitude that we are clever careful
>> programmers and we can figure out how to make the tool help us instead
>> of just ignoring it.
>
> I'm certainly open to suggestions as to how to improve the situation, 
> but the simple-minded "keep track of the depth in the tree" approach 
> doesn't work.

For the general device tree perhaps.  For the problem of your removal
of your sysfs attribute the solution should be sufficient.  I don't
particularly like it because it will take a lot of testing to find any
bugs that lockdep might reveal but at least you will have a fighting
chance.

I would like to look at the general problem of the device tree locking.
Unfortunately I am tilting at enough other windmills right now that I
can't do that problem justice.

Eric

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-04-27 16:09           ` Tejun Heo
@ 2012-05-03 21:30             ` Alan Stern
  2012-05-04 16:52               ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Stern @ 2012-05-03 21:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Fri, 27 Apr 2012, Tejun Heo wrote:

> > Would it be better to release just the lockdep annotation while
> > continuing to hold the actual lock, or to really drop the lock?
> 
> Just the lockdep annotation, I think.

This is turning out to be harder than it looked.

In order to release the lockdep annotation, I need the lockdep_map
which is stored in the sysfs_dirent structure.  But when the attribute
method is called, all it is given is a pointer to the attribute itself
(which contains the lockdep_class_key but not the lockdep_map) and a
pointer to the corresponding kobject.

Is there any reasonable way to get from the kobject and the attribute
to the appropriate sysfs_dirent?  Search through all the groups
attached to the kobject?  Restrict the new interface so that it can be
used only by attributes at the kobject's top level (i.e., not in a
named group)?

Any suggestions?

Alan Stern


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-05-03 21:30             ` Alan Stern
@ 2012-05-04 16:52               ` Tejun Heo
  2012-05-04 19:08                 ` Alan Stern
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2012-05-04 16:52 UTC (permalink / raw)
  To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Thu, May 03, 2012 at 05:30:30PM -0400, Alan Stern wrote:
> On Fri, 27 Apr 2012, Tejun Heo wrote:
> 
> > > Would it be better to release just the lockdep annotation while
> > > continuing to hold the actual lock, or to really drop the lock?
> > 
> > Just the lockdep annotation, I think.
> 
> This is turning out to be harder than it looked.
> 
> In order to release the lockdep annotation, I need the lockdep_map
> which is stored in the sysfs_dirent structure.  But when the attribute
> method is called, all it is given is a pointer to the attribute itself
> (which contains the lockdep_class_key but not the lockdep_map) and a
> pointer to the corresponding kobject.
> 
> Is there any reasonable way to get from the kobject and the attribute
> to the appropriate sysfs_dirent?  Search through all the groups
> attached to the kobject?  Restrict the new interface so that it can be
> used only by attributes at the kobject's top level (i.e., not in a
> named group)?
> 
> Any suggestions?

Urgh... I can't think of anything pretty.  How about just marking the
attr as "I'm special" and let sysfs code override lockdep annotation?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-05-04 16:52               ` Tejun Heo
@ 2012-05-04 19:08                 ` Alan Stern
  2012-05-07 19:46                   ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Stern @ 2012-05-04 19:08 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Fri, 4 May 2012, Tejun Heo wrote:

> > Is there any reasonable way to get from the kobject and the attribute
> > to the appropriate sysfs_dirent?  Search through all the groups
> > attached to the kobject?  Restrict the new interface so that it can be
> > used only by attributes at the kobject's top level (i.e., not in a
> > named group)?
> > 
> > Any suggestions?
> 
> Urgh... I can't think of anything pretty.  How about just marking the
> attr as "I'm special" and let sysfs code override lockdep annotation?

Never mind, I figured it out.  The caller knows what the group is (if 
there's a group), so it can simply pass that information along.

To make things clearer, I broke the interface into two parts: one to 
find the sysfs_dirent and the other to handle the lockdep annotations.  
Here's the patch -- it works.  What do you think?

Alan Stern

P.S.: I'm not sure about the namespace arguments to sysfs_get_dirent().  
Is NULL appropriate?  Or should I use sysfs_ns_type(kobj->sd)?




Index: usb-3.4/include/linux/sysfs.h
===================================================================
--- usb-3.4.orig/include/linux/sysfs.h
+++ usb-3.4/include/linux/sysfs.h
@@ -121,6 +121,10 @@ struct sysfs_dirent;
 
 int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *),
 			    void *data, struct module *owner);
+struct sysfs_dirent *sysfs_find_dirent_for_attr(struct kobject *kobj,
+			const struct attribute *attr, const char *group);
+void sysfs_unannotate_readlock(struct sysfs_dirent *sd);
+void sysfs_reannotate_readlock(struct sysfs_dirent *sd);
 
 int __must_check sysfs_create_dir(struct kobject *kobj);
 void sysfs_remove_dir(struct kobject *kobj);
@@ -188,6 +192,21 @@ static inline int sysfs_schedule_callbac
 	return -ENOSYS;
 }
 
+static inline struct sysfs_dirent *sysfs_find_dirent_for_attr(
+			struct kobject *kobj,
+			const struct attribute *attr, const char *group)
+{
+	return NULL;
+}
+
+static inline void sysfs_unannotate_readlock(struct sysfs_dirent *sd)
+{
+}
+
+static inline void sysfs_reannotate_readlock(struct sysfs_dirent *sd)
+{
+}
+
 static inline int sysfs_create_dir(struct kobject *kobj)
 {
 	return 0;
Index: usb-3.4/include/linux/device.h
===================================================================
--- usb-3.4.orig/include/linux/device.h
+++ usb-3.4/include/linux/device.h
@@ -514,6 +514,11 @@ extern void device_remove_bin_file(struc
 				   const struct bin_attribute *attr);
 extern int device_schedule_callback_owner(struct device *dev,
 		void (*func)(struct device *dev), struct module *owner);
+extern void *device_start_attribute_infanticide(
+		struct device *dev, const struct device_attribute *attr,
+		const char *group);
+extern void device_end_attribute_infanticide(void *cookie);
+
 
 /* This is a macro to avoid include problems with THIS_MODULE */
 #define device_schedule_callback(dev, func)			\
Index: usb-3.4/fs/sysfs/dir.c
===================================================================
--- usb-3.4.orig/fs/sysfs/dir.c
+++ usb-3.4/fs/sysfs/dir.c
@@ -133,6 +133,74 @@ static void sysfs_unlink_sibling(struct
 }
 
 /**
+ *	sysfs_find_dirent_for_attr - find sysfs_dirent for a given kobject and attribute
+ *	@kobj: kobject the attribute is attached to
+ *	@attr: attribute whose sysfs_dirent should be found
+ *	@group: name of the group containing @attr, or NULL
+ *
+ *	Looks up and returns a pointer to the sysfs_dirent structure
+ *	corresponding to the instance of @attr in @group attached to @kobj.
+ *	Returns NULL if the sysfs_dirent cannot be found.
+ *
+ *	Retains a reference to the returned sysfs_dirent.  The caller must
+ *	release this reference, as done by sysfs_reannotate_readlock()
+ *	below.
+ */
+struct sysfs_dirent *sysfs_find_dirent_for_attr(struct kobject *kobj,
+			const struct attribute *attr, const char *group)
+{
+	struct sysfs_dirent *dir_sd;
+	struct sysfs_dirent *sd;
+
+	if (group)
+		dir_sd = sysfs_get_dirent(kobj->sd, NULL, group);
+	else
+		dir_sd = sysfs_get(kobj->sd);
+	if (!dir_sd)
+		return NULL;
+
+	sd = sysfs_get_dirent(dir_sd, NULL, attr->name);
+	sysfs_put(dir_sd);
+	return sd;
+}
+
+/**
+ *	sysfs_unannotate_readlock - drop the lockdep annotation for an attribute
+ *	@sd: sysfs_dirent for the attribute whose lock annotation is dropped
+ *
+ *	Lockdep is not able to cope with the tree-structured locks used
+ *	by the device core.  It considers all the s_active lock associated
+ *	a particular attribute to be the same.  This causes a false
+ *	positive report when a method for an attribute attached to a device
+ *	wants to remove the same attribute attached to a child device.
+ *
+ *	To avoid such false positives, the method can call this routine to
+ *	drop the lockdep annotation indicating that the attribute's dirent is
+ *	currently locked, remove the child's attribute, and then reacquire
+ *	the annotation.
+ */
+void sysfs_unannotate_readlock(struct sysfs_dirent *sd)
+{
+	rwsem_release(&sd->dep_map, 1, _RET_IP_);
+}
+
+/**
+ *	sysfs_reannotate_readlock - reacquire the lockdep annotation for an attribute
+ *	@sd: sysfs_dirent for the attribute whose lock annotation is reacquired
+ *
+ *	After an attribute method has called sysfs_unannotate_readlock()
+ *	and removed the attributes from various child devices, it should
+ *	call this routine to reacquire the lockdep annotation (and also to
+ *	drop the reference to @sd obtained by
+ *	sysfs_get_dirent_for_attribute()) before returning.
+ */
+void sysfs_reannotate_readlock(struct sysfs_dirent *sd)
+{
+	rwsem_acquire_read(&sd->dep_map, 0, 1, _RET_IP_);
+	sysfs_put(sd);
+}
+
+/**
  *	sysfs_get_active - get an active reference to sysfs_dirent
  *	@sd: sysfs_dirent to get an active reference to
  *
Index: usb-3.4/drivers/base/core.c
===================================================================
--- usb-3.4.orig/drivers/base/core.c
+++ usb-3.4/drivers/base/core.c
@@ -609,6 +609,48 @@ int device_schedule_callback_owner(struc
 }
 EXPORT_SYMBOL_GPL(device_schedule_callback_owner);
 
+/**
+ * device_start_attribute_infanticide - prepare to remove child attributes
+ * @dev: device passed to the calling method
+ * @attr: attribute passed to the calling method
+ * @group: name of the group containing @attr, or NULL
+ *
+ * To avoid false positive reports from lockdep, this routine should be
+ * called by an attribute method that wants to remove the same attribute
+ * from a descendant of the device for which it was called (an
+ * "infanticidal" attribute).
+ *
+ * The returned value should be treated as an opaque cookie and passed to
+ * device_end_attribute_infanticide() below after the descendant
+ * attributes have been removed.
+ */
+void *device_start_attribute_infanticide(
+		struct device *dev, const struct device_attribute *attr,
+		const char *group)
+{
+	struct sysfs_dirent *sd;
+
+	sd = sysfs_find_dirent_for_attr(&dev->kobj, &attr->attr, group);
+	if (sd)
+		sysfs_unannotate_readlock(sd);
+	return sd;
+}
+EXPORT_SYMBOL_GPL(device_start_attribute_infanticide);
+
+/**
+ * device_end_attribute_infanticide - finish removing child attributes
+ * @cookie: value returned by device_start_attribute_infanticide() above
+ *
+ * This routine should be called by attribute methods after they have
+ * finished removing the corresponding attributes from descendant devices.
+ */
+void device_end_attribute_infanticide(void *cookie)
+{
+	if (cookie)
+		sysfs_reannotate_readlock(cookie);
+}
+EXPORT_SYMBOL_GPL(device_end_attribute_infanticide);
+
 static void klist_children_get(struct klist_node *n)
 {
 	struct device_private *p = to_device_private_parent(n);
Index: usb-3.4/drivers/usb/core/sysfs.c
===================================================================
--- usb-3.4.orig/drivers/usb/core/sysfs.c
+++ usb-3.4/drivers/usb/core/sysfs.c
@@ -64,11 +64,14 @@ set_bConfigurationValue(struct device *d
 {
 	struct usb_device	*udev = to_usb_device(dev);
 	int			config, value;
+	void			*cookie;
 
 	if (sscanf(buf, "%d", &config) != 1 || config < -1 || config > 255)
 		return -EINVAL;
 	usb_lock_device(udev);
+	cookie = device_start_attribute_infanticide(dev, attr, NULL);
 	value = usb_set_configuration(udev, config);
+	device_end_attribute_infanticide(cookie);
 	usb_unlock_device(udev);
 	return (value < 0) ? value : count;
 }
@@ -588,10 +591,15 @@ static ssize_t usb_dev_authorized_store(
 	result = sscanf(buf, "%u\n", &val);
 	if (result != 1)
 		result = -EINVAL;
-	else if (val == 0)
+	else if (val == 0) {
+		void *cookie;
+
+		cookie = device_start_attribute_infanticide(dev, attr, NULL);
 		result = usb_deauthorize_device(usb_dev);
-	else
+		device_end_attribute_infanticide(cookie);
+	} else {
 		result = usb_authorize_device(usb_dev);
+	}
 	return result < 0? result : size;
 }
 
@@ -605,12 +613,15 @@ static ssize_t usb_remove_store(struct d
 {
 	struct usb_device *udev = to_usb_device(dev);
 	int rc = 0;
+	void *cookie;
 
 	usb_lock_device(udev);
 	if (udev->state != USB_STATE_NOTATTACHED) {
 
 		/* To avoid races, first unconfigure and then remove */
+		cookie = device_start_attribute_infanticide(dev, attr, NULL);
 		usb_set_configuration(udev, -1);
+		device_end_attribute_infanticide(cookie);
 		rc = usb_remove_device(udev);
 	}
 	if (rc == 0)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-05-04 19:08                 ` Alan Stern
@ 2012-05-07 19:46                   ` Tejun Heo
  2012-05-07 21:51                     ` Alan Stern
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2012-05-07 19:46 UTC (permalink / raw)
  To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

Hello, Alan.

On Fri, May 04, 2012 at 03:08:53PM -0400, Alan Stern wrote:
> @@ -588,10 +591,15 @@ static ssize_t usb_dev_authorized_store(
>  	result = sscanf(buf, "%u\n", &val);
>  	if (result != 1)
>  		result = -EINVAL;
> -	else if (val == 0)
> +	else if (val == 0) {
> +		void *cookie;
> +
> +		cookie = device_start_attribute_infanticide(dev, attr, NULL);
>  		result = usb_deauthorize_device(usb_dev);
> -	else
> +		device_end_attribute_infanticide(cookie);
> +	} else {
>  		result = usb_authorize_device(usb_dev);
> +	}

I *think* it looks way too huge as lockdep workaround.  We're adding a
whole separate lookup interface for this.  If looking up afterwards is
difficult, can't we get away with adding a field in struct attribute?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-05-07 19:46                   ` Tejun Heo
@ 2012-05-07 21:51                     ` Alan Stern
  2012-05-07 21:55                       ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Stern @ 2012-05-07 21:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Mon, 7 May 2012, Tejun Heo wrote:

> Hello, Alan.

Hi.

> On Fri, May 04, 2012 at 03:08:53PM -0400, Alan Stern wrote:
> > @@ -588,10 +591,15 @@ static ssize_t usb_dev_authorized_store(
> >  	result = sscanf(buf, "%u\n", &val);
> >  	if (result != 1)
> >  		result = -EINVAL;
> > -	else if (val == 0)
> > +	else if (val == 0) {
> > +		void *cookie;
> > +
> > +		cookie = device_start_attribute_infanticide(dev, attr, NULL);
> >  		result = usb_deauthorize_device(usb_dev);
> > -	else
> > +		device_end_attribute_infanticide(cookie);
> > +	} else {
> >  		result = usb_authorize_device(usb_dev);
> > +	}
> 
> I *think* it looks way too huge as lockdep workaround.  We're adding a

Well, it's not _terribly_ huge.  As far as the driver is concerned, 
it's just a start and an end call.  The lookup interface isn't even 
EXPORTed, because its only user is the device core.

> whole separate lookup interface for this.  If looking up afterwards is
> difficult, can't we get away with adding a field in struct attribute?

You mean, an "ignore this attribute for lockdep purposes" flag?  Yes, 
that would work just as well.

I guess in the end it's a question of balance.  Which has more 
overhead, adding a few function calls here and there, or adding a new 
flags field to every struct attribute?

Alan Stern


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-05-07 21:51                     ` Alan Stern
@ 2012-05-07 21:55                       ` Tejun Heo
  2012-05-08 18:53                         ` Alan Stern
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2012-05-07 21:55 UTC (permalink / raw)
  To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Mon, May 07, 2012 at 05:51:52PM -0400, Alan Stern wrote:
> I guess in the end it's a question of balance.  Which has more 
> overhead, adding a few function calls here and there, or adding a new 
> flags field to every struct attribute?

Yes, and there are different types of overheads.  I'm happy to trade
some runtime memory overhead under debugging mode for lower code
complexity.  Lock proving is pretty expensive anyway.  I don't think
there's much point in trying to optimize some bytes from struct
attributes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-05-07 21:55                       ` Tejun Heo
@ 2012-05-08 18:53                         ` Alan Stern
  2012-05-09 17:41                           ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Stern @ 2012-05-08 18:53 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Mon, 7 May 2012, Tejun Heo wrote:

> On Mon, May 07, 2012 at 05:51:52PM -0400, Alan Stern wrote:
> > I guess in the end it's a question of balance.  Which has more 
> > overhead, adding a few function calls here and there, or adding a new 
> > flags field to every struct attribute?
> 
> Yes, and there are different types of overheads.  I'm happy to trade
> some runtime memory overhead under debugging mode for lower code
> complexity.  Lock proving is pretty expensive anyway.  I don't think
> there's much point in trying to optimize some bytes from struct
> attributes.

Okay, then what do you think about this approach?  It does seem smaller 
and simpler than the previous attempt.

And I did try to avoid unnecessary bloat; if lockdep isn't being used
then the extra attribute flag isn't present.

Alan Stern




Index: usb-3.4/include/linux/sysfs.h
===================================================================
--- usb-3.4.orig/include/linux/sysfs.h
+++ usb-3.4/include/linux/sysfs.h
@@ -27,6 +27,7 @@ struct attribute {
 	const char		*name;
 	umode_t			mode;
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
+	bool			ignore_lockdep:1;
 	struct lock_class_key	*key;
 	struct lock_class_key	skey;
 #endif
@@ -80,6 +81,17 @@ struct attribute_group {
 
 #define __ATTR_NULL { .attr = { .name = NULL } }
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#define __ATTR_IGNORE_LOCKDEP(_name,_mode,_show,_store) { \
+	.attr = {.name = __stringify(_name), .mode = _mode,	\
+			.ignore_lockdep = true },		\
+	.show		= _show,				\
+	.store		= _store,				\
+}
+#else
+#define __ATTR_IGNORE_LOCKDEP	__ATTR
+#endif
+
 #define attr_name(_attr) (_attr).attr.name
 
 struct file;
Index: usb-3.4/fs/sysfs/dir.c
===================================================================
--- usb-3.4.orig/fs/sysfs/dir.c
+++ usb-3.4/fs/sysfs/dir.c
@@ -132,6 +132,24 @@ static void sysfs_unlink_sibling(struct
 	rb_erase(&sd->s_rb, &sd->s_parent->s_dir.children);
 }
 
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+
+/* Test for attributes that want to ignore lockdep for read-locking */
+static bool ignore_lockdep(struct sysfs_dirent *sd)
+{
+	return sysfs_type(sd) == SYSFS_KOBJ_ATTR &&
+			sd->s_attr.attr->ignore_lockdep;
+}
+
+#else
+
+static inline bool ignore_lockdep(struct sysfs_dirent *sd)
+{
+	return true;
+}
+
+#endif
+
 /**
  *	sysfs_get_active - get an active reference to sysfs_dirent
  *	@sd: sysfs_dirent to get an active reference to
@@ -155,15 +173,17 @@ struct sysfs_dirent *sysfs_get_active(st
 			return NULL;
 
 		t = atomic_cmpxchg(&sd->s_active, v, v + 1);
-		if (likely(t == v)) {
-			rwsem_acquire_read(&sd->dep_map, 0, 1, _RET_IP_);
-			return sd;
-		}
+		if (likely(t == v))
+			break;
 		if (t < 0)
 			return NULL;
 
 		cpu_relax();
 	}
+
+	if (likely(!ignore_lockdep(sd)))
+		rwsem_acquire_read(&sd->dep_map, 0, 1, _RET_IP_);
+	return sd;
 }
 
 /**
@@ -180,7 +200,8 @@ void sysfs_put_active(struct sysfs_diren
 	if (unlikely(!sd))
 		return;
 
-	rwsem_release(&sd->dep_map, 1, _RET_IP_);
+	if (likely(!ignore_lockdep(sd)))
+		rwsem_release(&sd->dep_map, 1, _RET_IP_);
 	v = atomic_dec_return(&sd->s_active);
 	if (likely(v != SD_DEACTIVATED_BIAS))
 		return;
Index: usb-3.4/include/linux/device.h
===================================================================
--- usb-3.4.orig/include/linux/device.h
+++ usb-3.4/include/linux/device.h
@@ -503,6 +503,9 @@ ssize_t device_store_int(struct device *
 #define DEVICE_INT_ATTR(_name, _mode, _var) \
 	struct dev_ext_attribute dev_attr_##_name = \
 		{ __ATTR(_name, _mode, device_show_ulong, device_store_ulong), &(_var) }
+#define DEVICE_ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store) \
+	struct device_attribute dev_attr_##_name =		\
+		__ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store)
 
 extern int device_create_file(struct device *device,
 			      const struct device_attribute *entry);
Index: usb-3.4/drivers/usb/core/sysfs.c
===================================================================
--- usb-3.4.orig/drivers/usb/core/sysfs.c
+++ usb-3.4/drivers/usb/core/sysfs.c
@@ -73,7 +73,7 @@ set_bConfigurationValue(struct device *d
 	return (value < 0) ? value : count;
 }
 
-static DEVICE_ATTR(bConfigurationValue, S_IRUGO | S_IWUSR,
+static DEVICE_ATTR_IGNORE_LOCKDEP(bConfigurationValue, S_IRUGO | S_IWUSR,
 		show_bConfigurationValue, set_bConfigurationValue);
 
 /* String fields */
@@ -595,7 +595,7 @@ static ssize_t usb_dev_authorized_store(
 	return result < 0? result : size;
 }
 
-static DEVICE_ATTR(authorized, 0644,
+static DEVICE_ATTR_IGNORE_LOCKDEP(authorized, 0644,
 	    usb_dev_authorized_show, usb_dev_authorized_store);
 
 /* "Safely remove a device" */
@@ -618,7 +618,7 @@ static ssize_t usb_remove_store(struct d
 	usb_unlock_device(udev);
 	return rc;
 }
-static DEVICE_ATTR(remove, 0200, NULL, usb_remove_store);
+static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0200, NULL, usb_remove_store);
 
 
 static struct attribute *dev_attrs[] = {


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-05-08 18:53                         ` Alan Stern
@ 2012-05-09 17:41                           ` Tejun Heo
  2012-05-09 17:47                             ` Alan Stern
  0 siblings, 1 reply; 24+ messages in thread
From: Tejun Heo @ 2012-05-09 17:41 UTC (permalink / raw)
  To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

Hello,

On Tue, May 08, 2012 at 02:53:11PM -0400, Alan Stern wrote:
> On Mon, 7 May 2012, Tejun Heo wrote:
> 
> > On Mon, May 07, 2012 at 05:51:52PM -0400, Alan Stern wrote:
> > > I guess in the end it's a question of balance.  Which has more 
> > > overhead, adding a few function calls here and there, or adding a new 
> > > flags field to every struct attribute?
> > 
> > Yes, and there are different types of overheads.  I'm happy to trade
> > some runtime memory overhead under debugging mode for lower code
> > complexity.  Lock proving is pretty expensive anyway.  I don't think
> > there's much point in trying to optimize some bytes from struct
> > attributes.
> 
> Okay, then what do you think about this approach?  It does seem smaller 
> and simpler than the previous attempt.
> 
> And I did try to avoid unnecessary bloat; if lockdep isn't being used
> then the extra attribute flag isn't present.

Yeap, looks good to me.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-05-09 17:41                           ` Tejun Heo
@ 2012-05-09 17:47                             ` Alan Stern
  2012-05-09 17:48                               ` Tejun Heo
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Stern @ 2012-05-09 17:47 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Wed, 9 May 2012, Tejun Heo wrote:

> Hello,
> 
> On Tue, May 08, 2012 at 02:53:11PM -0400, Alan Stern wrote:
> > On Mon, 7 May 2012, Tejun Heo wrote:
> > 
> > > On Mon, May 07, 2012 at 05:51:52PM -0400, Alan Stern wrote:
> > > > I guess in the end it's a question of balance.  Which has more 
> > > > overhead, adding a few function calls here and there, or adding a new 
> > > > flags field to every struct attribute?
> > > 
> > > Yes, and there are different types of overheads.  I'm happy to trade
> > > some runtime memory overhead under debugging mode for lower code
> > > complexity.  Lock proving is pretty expensive anyway.  I don't think
> > > there's much point in trying to optimize some bytes from struct
> > > attributes.
> > 
> > Okay, then what do you think about this approach?  It does seem smaller 
> > and simpler than the previous attempt.
> > 
> > And I did try to avoid unnecessary bloat; if lockdep isn't being used
> > then the extra attribute flag isn't present.
> 
> Yeap, looks good to me.

Unless there are any objections from Eric or Peter in the next few
days, I'll submit it.  Can I add your Acked-by?

Alan Stern


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Lockdep false positive in sysfs
  2012-05-09 17:47                             ` Alan Stern
@ 2012-05-09 17:48                               ` Tejun Heo
  0 siblings, 0 replies; 24+ messages in thread
From: Tejun Heo @ 2012-05-09 17:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list

On Wed, May 09, 2012 at 01:47:34PM -0400, Alan Stern wrote:
> On Wed, 9 May 2012, Tejun Heo wrote:
> 
> > Hello,
> > 
> > On Tue, May 08, 2012 at 02:53:11PM -0400, Alan Stern wrote:
> > > On Mon, 7 May 2012, Tejun Heo wrote:
> > > 
> > > > On Mon, May 07, 2012 at 05:51:52PM -0400, Alan Stern wrote:
> > > > > I guess in the end it's a question of balance.  Which has more 
> > > > > overhead, adding a few function calls here and there, or adding a new 
> > > > > flags field to every struct attribute?
> > > > 
> > > > Yes, and there are different types of overheads.  I'm happy to trade
> > > > some runtime memory overhead under debugging mode for lower code
> > > > complexity.  Lock proving is pretty expensive anyway.  I don't think
> > > > there's much point in trying to optimize some bytes from struct
> > > > attributes.
> > > 
> > > Okay, then what do you think about this approach?  It does seem smaller 
> > > and simpler than the previous attempt.
> > > 
> > > And I did try to avoid unnecessary bloat; if lockdep isn't being used
> > > then the extra attribute flag isn't present.
> > 
> > Yeap, looks good to me.
> 
> Unless there are any objections from Eric or Peter in the next few
> days, I'll submit it.  Can I add your Acked-by?

Sure.

Thanks for the persistence.

-- 
tejun

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-05-09 17:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-25 18:58 Lockdep false positive in sysfs Alan Stern
2012-04-25 21:59 ` Tejun Heo
2012-04-26  8:16   ` Eric W. Biederman
2012-04-26 18:14     ` Alan Stern
2012-04-26 22:17       ` Tejun Heo
2012-04-27 15:57         ` Alan Stern
2012-04-27 16:09           ` Tejun Heo
2012-05-03 21:30             ` Alan Stern
2012-05-04 16:52               ` Tejun Heo
2012-05-04 19:08                 ` Alan Stern
2012-05-07 19:46                   ` Tejun Heo
2012-05-07 21:51                     ` Alan Stern
2012-05-07 21:55                       ` Tejun Heo
2012-05-08 18:53                         ` Alan Stern
2012-05-09 17:41                           ` Tejun Heo
2012-05-09 17:47                             ` Alan Stern
2012-05-09 17:48                               ` Tejun Heo
2012-04-27 16:27           ` Eric W. Biederman
2012-04-27 18:27             ` Alan Stern
2012-04-27 20:17               ` Tejun Heo
2012-04-27 21:09                 ` Eric W. Biederman
2012-04-27 21:16                   ` Tejun Heo
2012-04-29  2:00                   ` Alan Stern
2012-04-29  2:35                     ` Eric W. Biederman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.