* Lockdep false positive in sysfs @ 2012-04-25 18:58 Alan Stern 2012-04-25 21:59 ` Tejun Heo 0 siblings, 1 reply; 24+ messages in thread From: Alan Stern @ 2012-04-25 18:58 UTC (permalink / raw) To: Peter Zijlstra, Tejun Heo; +Cc: Kernel development list Peter and Tejun: Here's my problem, which affects several sysfs attribute methods in the USB subsystem: Sysfs attribute A is attached to a USB device D. When the user writes to A, the corresponding store method unregisters D's children (it does not unregister D, though). Now, some of these children may also be USB devices (i.e., if D is a hub), and therefore may have the same set of sysfs attributes. As a result, A's store method for D will end up removing the A attribute for device E, where E is a child of D. This causes lockdep to complain. When A's method is called, sysfs tells lockdep that it holds a readlock for the s_active "rwsem" associated with the A attribute for D. However the sysfs routine that removes attributes tells lockdep that it is going to get a writelock for the s_active associated with the A attribute for E, which is in the same lockdep class since it belongs to the same attribute. The resulting splat looks like this: [ 1004.679564] ============================================= [ 1004.680053] [ INFO: possible recursive locking detected ] [ 1004.680053] 3.4.0-rc4+ #17 Tainted: G O [ 1004.680053] --------------------------------------------- [ 1004.680053] bash/1256 is trying to acquire lock: [ 1004.680053] (s_active#73){++++.+}, at: [<c10c9c2b>] sysfs_hash_and_remove+0x60/0x71 [ 1004.680053] [ 1004.680053] but task is already holding lock: [ 1004.680053] (s_active#73){++++.+}, at: [<c10c9fce>] sysfs_write_file+0xa7/0xea [ 1004.680053] [ 1004.680053] other info that might help us debug this: [ 1004.680053] Possible unsafe locking scenario: [ 1004.680053] [ 1004.680053] CPU0 [ 1004.680053] ---- [ 1004.680053] lock(s_active#73); [ 1004.680053] lock(s_active#73); [ 1004.680053] [ 1004.680053] *** DEADLOCK *** [ 1004.680053] [ 1004.680053] May be due to missing lock nesting notation [ 1004.680053] [ 1004.680053] 4 locks held by bash/1256: [ 1004.680053] #0: (&buffer->mutex){+.+.+.}, at: [<c10c9f4c>] sysfs_write_file+0x25/0xea [ 1004.680053] #1: (s_active#73){++++.+}, at: [<c10c9fce>] sysfs_write_file+0xa7/0xea [ 1004.680053] #2: (&__lockdep_no_validate__){......}, at: [<f002c376>] usb_deauthorize_device+0x16/0xb2 [usbcore] [ 1004.680053] #3: (&__lockdep_no_validate__){......}, at: [<c1185767>] device_release_driver+0x13/0x25 [ 1004.680053] [ 1004.680053] stack backtrace: [ 1004.680053] Pid: 1256, comm: bash Tainted: G O 3.4.0-rc4+ #17 [ 1004.680053] Call Trace: [ 1004.680053] [<c101ea89>] ? console_unlock+0x1ad/0x1d3 [ 1004.680053] [<c104f712>] __lock_acquire+0x82f/0xb8a [ 1004.680053] [<c104df6d>] ? mark_lock+0x26/0x21f [ 1004.680053] [<c104e47a>] ? debug_check_no_locks_freed+0x112/0x11c [ 1004.680053] [<c104e31c>] ? trace_hardirqs_on_caller+0x13f/0x17e [ 1004.680053] [<c104e366>] ? trace_hardirqs_on+0xb/0xd [ 1004.680053] [<c104fde5>] lock_acquire+0x5e/0x75 [ 1004.680053] [<c10c9c2b>] ? sysfs_hash_and_remove+0x60/0x71 [ 1004.680053] [<c10caf8c>] sysfs_addrm_finish+0x86/0xd2 [ 1004.680053] [<c10c9c2b>] ? sysfs_hash_and_remove+0x60/0x71 [ 1004.680053] [<c10c0000>] ? m_next+0x4c/0x56 [ 1004.680053] [<c10c0000>] ? m_next+0x4c/0x56 [ 1004.680053] [<c10c9c2b>] sysfs_hash_and_remove+0x60/0x71 [ 1004.680053] [<c10cc329>] sysfs_remove_group+0x56/0x77 [ 1004.680053] [<c118331e>] device_remove_groups+0x17/0x25 [ 1004.680053] [<c11834b5>] device_remove_attrs+0x1c/0x47 [ 1004.680053] [<c11835c6>] device_del+0xe6/0x135 [ 1004.680053] [<f002b658>] usb_disconnect+0x9a/0xd4 [usbcore] [ 1004.680053] [<f002b6d6>] hub_quiesce+0x44/0x83 [usbcore] [ 1004.680053] [<f002b86c>] hub_disconnect+0x6b/0xdb [usbcore] [ 1004.680053] [<f0032f1d>] usb_unbind_interface+0x42/0xf9 [usbcore] [ 1004.680053] [<c1185719>] __device_release_driver+0x66/0xa1 [ 1004.680053] [<c118576e>] device_release_driver+0x1a/0x25 [ 1004.680053] [<c1185398>] bus_remove_device+0xa3/0xb0 [ 1004.680053] [<c11835cd>] device_del+0xed/0x135 [ 1004.680053] [<f00316fb>] usb_disable_device+0x79/0x19b [usbcore] [ 1004.680053] [<f0031e1d>] usb_set_configuration+0x18e/0x513 [usbcore] [ 1004.680053] [<f002c397>] usb_deauthorize_device+0x37/0xb2 [usbcore] [ 1004.680053] [<f0035204>] usb_dev_authorized_store+0x31/0x43 [usbcore] [ 1004.680053] [<f00351d3>] ? set_persist+0x67/0x67 [usbcore] [ 1004.680053] [<c1182f83>] dev_attr_store+0x1b/0x23 [ 1004.680053] [<c10c9fe5>] sysfs_write_file+0xbe/0xea [ 1004.680053] [<c10c9f27>] ? sysfs_open_file+0x1c6/0x1c6 [ 1004.680053] [<c108ebc0>] vfs_write+0x74/0xa0 [ 1004.680053] [<c108ed21>] sys_write+0x3b/0x5d [ 1004.680053] [<c1221210>] sysenter_do_call+0x12/0x36 Normally one solves this sort of problem by annotating one of the locks as a nested call. But there doesn't appear to be any reasonable way of doing this for sysfs attributes. I could work around the problem by having the method do everything in a workqueue, but I would prefer not to -- the actions taken by these attributes really ought to be synchronous. Do you guys have any suggestions for better solutions? Alan Stern ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-25 18:58 Lockdep false positive in sysfs Alan Stern @ 2012-04-25 21:59 ` Tejun Heo 2012-04-26 8:16 ` Eric W. Biederman 0 siblings, 1 reply; 24+ messages in thread From: Tejun Heo @ 2012-04-25 21:59 UTC (permalink / raw) To: Alan Stern; +Cc: Peter Zijlstra, Kernel development list Hello, Alan. On Wed, Apr 25, 2012 at 02:58:28PM -0400, Alan Stern wrote: > Peter and Tejun: > > Here's my problem, which affects several sysfs attribute methods in the > USB subsystem: > > Sysfs attribute A is attached to a USB device D. When the user writes > to A, the corresponding store method unregisters D's children (it does > not unregister D, though). > > Now, some of these children may also be USB devices (i.e., if D is a > hub), and therefore may have the same set of sysfs attributes. As a > result, A's store method for D will end up removing the A attribute for > device E, where E is a child of D. > > This causes lockdep to complain. When A's method is called, sysfs > tells lockdep that it holds a readlock for the s_active "rwsem" > associated with the A attribute for D. However the sysfs routine that > removes attributes tells lockdep that it is going to get a writelock > for the s_active associated with the A attribute for E, which is in the > same lockdep class since it belongs to the same attribute. Hmmm.... This happens because, by default, sysfs_dirents for the same attr share the same lockdep key. This happens from sysfs_dirent_init_lockdep(). Hmm.... we can, * Somehow assign different keys to sysfs_dirents for the specific attr. Use array of attrs indexed by bus depth? * Add a flag / whatever to attr indicating that the files of the attribute may be removed recursively (lockdep-wise) and update either read or write path to use subclass. Any better ideas? Thanks. -- tejun ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-25 21:59 ` Tejun Heo @ 2012-04-26 8:16 ` Eric W. Biederman 2012-04-26 18:14 ` Alan Stern 0 siblings, 1 reply; 24+ messages in thread From: Eric W. Biederman @ 2012-04-26 8:16 UTC (permalink / raw) To: Tejun Heo; +Cc: Alan Stern, Peter Zijlstra, Kernel development list Tejun Heo <tj@kernel.org> writes: > Hello, Alan. > > On Wed, Apr 25, 2012 at 02:58:28PM -0400, Alan Stern wrote: >> Peter and Tejun: >> >> Here's my problem, which affects several sysfs attribute methods in the >> USB subsystem: >> >> Sysfs attribute A is attached to a USB device D. When the user writes >> to A, the corresponding store method unregisters D's children (it does >> not unregister D, though). >> >> Now, some of these children may also be USB devices (i.e., if D is a >> hub), and therefore may have the same set of sysfs attributes. As a >> result, A's store method for D will end up removing the A attribute for >> device E, where E is a child of D. Before we get too far getting lockdep to shut up let me ask. Is this really safe? The code looks like it tries to be safe but in the case of recursion I don't buy it. You should be having this same problem with other locks and I don't see the use of mutex_lock_nested anywhere. I looked into this a couple years ago with regards to pci devices, and my conclusion at the time was that the device core nor the pci code was multi-thread hotplug safe, and that the easiest fix was to use a single threaded workqueue. What I see right now with the device_lock and usb looks better but I'm not at all certain I believe that if you attempt to remove different levels of the hierarchy at the same time from different processes that this code is really safe. Could you explain to me how the locking works in the usb layer to prevent problems with different processes removing different layers of the same hierarchy? Especially having it work without having having to do something interesting with lockdep. >> This causes lockdep to complain. When A's method is called, sysfs >> tells lockdep that it holds a readlock for the s_active "rwsem" >> associated with the A attribute for D. However the sysfs routine that >> removes attributes tells lockdep that it is going to get a writelock >> for the s_active associated with the A attribute for E, which is in the >> same lockdep class since it belongs to the same attribute. > > Hmmm.... This happens because, by default, sysfs_dirents for the same > attr share the same lockdep key. This happens from > sysfs_dirent_init_lockdep(). Hmm.... we can, > > * Somehow assign different keys to sysfs_dirents for the specific > attr. Use array of attrs indexed by bus depth? Possible with sysfs_attr_init but pretty ugly. Especially since it sounds like this is a situation that does not presuppose a maximum depth. I do remember that the lockdep keys must be statically allocated which makes this a challenge. Or it might be a remote possibility to adapt things a little bit. What makes this different from most of the recursing lock strategies that I know of is that we are actually in fact taking the same "lock" in the same place, and performing the same action. > * Add a flag / whatever to attr indicating that the files of the > attribute may be removed recursively (lockdep-wise) and update > either read or write path to use subclass. > > Any better ideas? I'm not going to worry about it too much until someone explains to me how usb hotplug is really safe and how of all the locks taken to make it happen somehow only the sysfs lock triggers a lockdep warning because we are recursing. We should be seeing the same issue with the device lock if the device lock actually provides sufficient protection to be useful. Eric ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-26 8:16 ` Eric W. Biederman @ 2012-04-26 18:14 ` Alan Stern 2012-04-26 22:17 ` Tejun Heo 0 siblings, 1 reply; 24+ messages in thread From: Alan Stern @ 2012-04-26 18:14 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Tejun Heo, Peter Zijlstra, Kernel development list On Thu, 26 Apr 2012, Eric W. Biederman wrote: > Tejun Heo <tj@kernel.org> writes: > > > Hello, Alan. > > > > On Wed, Apr 25, 2012 at 02:58:28PM -0400, Alan Stern wrote: > >> Peter and Tejun: > >> > >> Here's my problem, which affects several sysfs attribute methods in the > >> USB subsystem: > >> > >> Sysfs attribute A is attached to a USB device D. When the user writes > >> to A, the corresponding store method unregisters D's children (it does > >> not unregister D, though). > >> > >> Now, some of these children may also be USB devices (i.e., if D is a > >> hub), and therefore may have the same set of sysfs attributes. As a > >> result, A's store method for D will end up removing the A attribute for > >> device E, where E is a child of D. > > Before we get too far getting lockdep to shut up let me ask. Is this > really safe? The code looks like it tries to be safe but in the case > of recursion I don't buy it. You should be having this same problem > with other locks and I don't see the use of mutex_lock_nested anywhere. You're right, of course. In the USB subsystem, configuration changes (which are what I was talking about above) and device unregistration are protected by the mutex of the underlying struct device. Thus, you're not allowed to change a device's configuration (which in the case of a hub would end up unregistering all the children) unless you already hold the device's lock, and you're not allowed to unregister a USB device unless you already hold its parent's lock. The reason this doesn't provoke any objections from lockdep is very simple: The mutex in struct device is explicitly set to the "novalidate" class in device_initialize(). Therefore lockdep essentially ignores it. We did this because after a fair amount of back-and-forth discussion, Peter and I were unable to come up with a good way of treating the tree-structured locks in the device hierarchy. (Actually it's a forest, not a tree, which makes the situation even worse.) I don't know of any other locks using the "novalidate" class, but I haven't looked to see. In any case, the locks used by sysfs do _not_ belong to this class, which is a good thing because we really do want to catch cases where an attribute's method tries to remove that same attribute -- so-called "suicidal" attributes. But that's not what's happening here. In this situation, attribute A for device D wants to remove attribute A for device E, not for D. This is safe, not suicidal, but there's currently no way to tell lockdep. Does this answer your question? I'm not quite certain what you were asking. > I looked into this a couple years ago with regards to pci devices, and > my conclusion at the time was that the device core nor the pci > code was multi-thread hotplug safe, and that the easiest fix was > to use a single threaded workqueue. As far as I know, the device core and the USB core are both multi-thread hotplug safe. I'm not as familiar with the PCI core. > What I see right now with the device_lock and usb looks better but > I'm not at all certain I believe that if you attempt to remove different > levels of the hierarchy at the same time from different processes > that this code is really safe. Could you explain to me how the > locking works in the usb layer to prevent problems with different > processes removing different layers of the same hierarchy? > > Especially having it work without having having to do something > interesting with lockdep. The locking rules are explained above. Let's consider an example. Suppose we have A - B - C, where A is the parent of B and B is the parent of C. Suppose thread 1 tries to remove B and concurrently, thread 2 tries to remove C. The resulting course of events depends on which thread acquires B's mutex first. Here's one possibility: Thread 1 Thread 2 -------- -------- usb_lock_device(A) usb_lock_device(B) usb_disconnect(B) usb_disconnect(C) usb_lock_device(B) (blocks) usb_lock_device(C) ... various housekeeping ... usb_unlock_device(C) device_del(&C->dev) put_device(&C->dev) usb_unlock_device(B) (acquires B's mutex) ... various housekeeping, including: remove any children of B (but C is not a child of B any more) ... end of housekeeping usb_unlock_device(B) device_del(&B->dev) put_device(&B->dev) usb_unlock_device(A) Here's the other: Thread 1 Thread 2 -------- -------- usb_lock_device(A) usb_disconnect(B) usb_lock_device(B) ... various housekeeping, including: usb_set_device_state(B, USB_STATE_NOTATTACHED) usb_disconnect(C) (recursive call) usb_lock_device(C) ... various housekeeping ... usb_unlock_device(C) usb_lock_device(B) (blocks) device_del(&C->dev) put_device(&C->dev) ... end of housekeeping usb_unlock_device(B) (acquires B's mutex) device_del(&B->dev) Sees that B's state is now USB_STATE_NOTATTACHED, so does not try to do anything about C usb_unlock_device(B) put_device(&B->dev) usb_unlock_device(A) Note that device_del() internally grabs the device's mutex, so thread 1's put_device(&B->dev) doesn't occur until after thread 2 unlocks B. > >> This causes lockdep to complain. When A's method is called, sysfs > >> tells lockdep that it holds a readlock for the s_active "rwsem" > >> associated with the A attribute for D. However the sysfs routine that > >> removes attributes tells lockdep that it is going to get a writelock > >> for the s_active associated with the A attribute for E, which is in the > >> same lockdep class since it belongs to the same attribute. > > > > Hmmm.... This happens because, by default, sysfs_dirents for the same > > attr share the same lockdep key. This happens from > > sysfs_dirent_init_lockdep(). Hmm.... we can, > > > > * Somehow assign different keys to sysfs_dirents for the specific > > attr. Use array of attrs indexed by bus depth? > > Possible with sysfs_attr_init but pretty ugly. Especially since it > sounds like this is a situation that does not presuppose a maximum > depth. I do remember that the lockdep keys must be statically allocated > which makes this a challenge. I agree; this doesn't seem like a good approach. > Or it might be a remote possibility to adapt things a little bit. > What makes this different from most of the recursing lock strategies > that I know of is that we are actually in fact taking the same > "lock" in the same place, and performing the same action. Well, not exactly the same place. The readlock is acquired in sysfs_get_active() and the writelock is acquired in sysfs_deactivate(). > > * Add a flag / whatever to attr indicating that the files of the > > attribute may be removed recursively (lockdep-wise) and update > > either read or write path to use subclass. > > > > Any better ideas? > > I'm not going to worry about it too much until someone explains to me > how usb hotplug is really safe and how of all the locks taken to > make it happen somehow only the sysfs lock triggers a lockdep warning > because we are recursing. We should be seeing the same issue with > the device lock if the device lock actually provides sufficient > protection to be useful. Another idea is to have A's method temporarily drop the sysfs readlock. Of course that would put the onus on the USB core of guaranteeing that A cannot be removed while this happens, but we can handle that. Alan Stern ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-26 18:14 ` Alan Stern @ 2012-04-26 22:17 ` Tejun Heo 2012-04-27 15:57 ` Alan Stern 0 siblings, 1 reply; 24+ messages in thread From: Tejun Heo @ 2012-04-26 22:17 UTC (permalink / raw) To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Thu, Apr 26, 2012 at 02:14:30PM -0400, Alan Stern wrote: > > > Hmmm.... This happens because, by default, sysfs_dirents for the same > > > attr share the same lockdep key. This happens from > > > sysfs_dirent_init_lockdep(). Hmm.... we can, > > > > > > * Somehow assign different keys to sysfs_dirents for the specific > > > attr. Use array of attrs indexed by bus depth? > > > > Possible with sysfs_attr_init but pretty ugly. Especially since it > > sounds like this is a situation that does not presuppose a maximum > > depth. I do remember that the lockdep keys must be statically allocated > > which makes this a challenge. The depth is limited by USB spec. > I agree; this doesn't seem like a good approach. It sure isn't pretty but probably best matches the situation in the sense that lockdep would actually be able to know about the nesting going on. > Another idea is to have A's method temporarily drop the sysfs readlock. > Of course that would put the onus on the USB core of guaranteeing that > A cannot be removed while this happens, but we can handle that. Yeah, that's an easier way out. Please make it a proper sysfs API call tho so that people working on sysfs later can know of the special case. Thanks. -- tejun ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-26 22:17 ` Tejun Heo @ 2012-04-27 15:57 ` Alan Stern 2012-04-27 16:09 ` Tejun Heo 2012-04-27 16:27 ` Eric W. Biederman 0 siblings, 2 replies; 24+ messages in thread From: Alan Stern @ 2012-04-27 15:57 UTC (permalink / raw) To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Thu, 26 Apr 2012, Tejun Heo wrote: > On Thu, Apr 26, 2012 at 02:14:30PM -0400, Alan Stern wrote: > > > > Hmmm.... This happens because, by default, sysfs_dirents for the same > > > > attr share the same lockdep key. This happens from > > > > sysfs_dirent_init_lockdep(). Hmm.... we can, > > > > > > > > * Somehow assign different keys to sysfs_dirents for the specific > > > > attr. Use array of attrs indexed by bus depth? > > > > > > Possible with sysfs_attr_init but pretty ugly. Especially since it > > > sounds like this is a situation that does not presuppose a maximum > > > depth. I do remember that the lockdep keys must be statically allocated > > > which makes this a challenge. > > The depth is limited by USB spec. > > > I agree; this doesn't seem like a good approach. > > It sure isn't pretty but probably best matches the situation in the > sense that lockdep would actually be able to know about the nesting > going on. By the way, do you know why attribute structures allow for dynamic keys as well as static keys? I see dynamic keys are used by attribute containers, but I don't understand why. > > Another idea is to have A's method temporarily drop the sysfs readlock. > > Of course that would put the onus on the USB core of guaranteeing that > > A cannot be removed while this happens, but we can handle that. > > Yeah, that's an easier way out. Please make it a proper sysfs API > call tho so that people working on sysfs later can know of the special > case. I will. Would it be better to release just the lockdep annotation while continuing to hold the actual lock, or to really drop the lock? Alan Stern ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-27 15:57 ` Alan Stern @ 2012-04-27 16:09 ` Tejun Heo 2012-05-03 21:30 ` Alan Stern 2012-04-27 16:27 ` Eric W. Biederman 1 sibling, 1 reply; 24+ messages in thread From: Tejun Heo @ 2012-04-27 16:09 UTC (permalink / raw) To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list Hello, On Fri, Apr 27, 2012 at 11:57:02AM -0400, Alan Stern wrote: > By the way, do you know why attribute structures allow for dynamic keys > as well as static keys? I see dynamic keys are used by attribute > containers, but I don't understand why. I have no idea. Eric? > > > Another idea is to have A's method temporarily drop the sysfs readlock. > > > Of course that would put the onus on the USB core of guaranteeing that > > > A cannot be removed while this happens, but we can handle that. > > > > Yeah, that's an easier way out. Please make it a proper sysfs API > > call tho so that people working on sysfs later can know of the special > > case. > > I will. > > Would it be better to release just the lockdep annotation while > continuing to hold the actual lock, or to really drop the lock? Just the lockdep annotation, I think. Thanks. -- tejun ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-27 16:09 ` Tejun Heo @ 2012-05-03 21:30 ` Alan Stern 2012-05-04 16:52 ` Tejun Heo 0 siblings, 1 reply; 24+ messages in thread From: Alan Stern @ 2012-05-03 21:30 UTC (permalink / raw) To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Fri, 27 Apr 2012, Tejun Heo wrote: > > Would it be better to release just the lockdep annotation while > > continuing to hold the actual lock, or to really drop the lock? > > Just the lockdep annotation, I think. This is turning out to be harder than it looked. In order to release the lockdep annotation, I need the lockdep_map which is stored in the sysfs_dirent structure. But when the attribute method is called, all it is given is a pointer to the attribute itself (which contains the lockdep_class_key but not the lockdep_map) and a pointer to the corresponding kobject. Is there any reasonable way to get from the kobject and the attribute to the appropriate sysfs_dirent? Search through all the groups attached to the kobject? Restrict the new interface so that it can be used only by attributes at the kobject's top level (i.e., not in a named group)? Any suggestions? Alan Stern ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-05-03 21:30 ` Alan Stern @ 2012-05-04 16:52 ` Tejun Heo 2012-05-04 19:08 ` Alan Stern 0 siblings, 1 reply; 24+ messages in thread From: Tejun Heo @ 2012-05-04 16:52 UTC (permalink / raw) To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Thu, May 03, 2012 at 05:30:30PM -0400, Alan Stern wrote: > On Fri, 27 Apr 2012, Tejun Heo wrote: > > > > Would it be better to release just the lockdep annotation while > > > continuing to hold the actual lock, or to really drop the lock? > > > > Just the lockdep annotation, I think. > > This is turning out to be harder than it looked. > > In order to release the lockdep annotation, I need the lockdep_map > which is stored in the sysfs_dirent structure. But when the attribute > method is called, all it is given is a pointer to the attribute itself > (which contains the lockdep_class_key but not the lockdep_map) and a > pointer to the corresponding kobject. > > Is there any reasonable way to get from the kobject and the attribute > to the appropriate sysfs_dirent? Search through all the groups > attached to the kobject? Restrict the new interface so that it can be > used only by attributes at the kobject's top level (i.e., not in a > named group)? > > Any suggestions? Urgh... I can't think of anything pretty. How about just marking the attr as "I'm special" and let sysfs code override lockdep annotation? Thanks. -- tejun ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-05-04 16:52 ` Tejun Heo @ 2012-05-04 19:08 ` Alan Stern 2012-05-07 19:46 ` Tejun Heo 0 siblings, 1 reply; 24+ messages in thread From: Alan Stern @ 2012-05-04 19:08 UTC (permalink / raw) To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Fri, 4 May 2012, Tejun Heo wrote: > > Is there any reasonable way to get from the kobject and the attribute > > to the appropriate sysfs_dirent? Search through all the groups > > attached to the kobject? Restrict the new interface so that it can be > > used only by attributes at the kobject's top level (i.e., not in a > > named group)? > > > > Any suggestions? > > Urgh... I can't think of anything pretty. How about just marking the > attr as "I'm special" and let sysfs code override lockdep annotation? Never mind, I figured it out. The caller knows what the group is (if there's a group), so it can simply pass that information along. To make things clearer, I broke the interface into two parts: one to find the sysfs_dirent and the other to handle the lockdep annotations. Here's the patch -- it works. What do you think? Alan Stern P.S.: I'm not sure about the namespace arguments to sysfs_get_dirent(). Is NULL appropriate? Or should I use sysfs_ns_type(kobj->sd)? Index: usb-3.4/include/linux/sysfs.h =================================================================== --- usb-3.4.orig/include/linux/sysfs.h +++ usb-3.4/include/linux/sysfs.h @@ -121,6 +121,10 @@ struct sysfs_dirent; int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *), void *data, struct module *owner); +struct sysfs_dirent *sysfs_find_dirent_for_attr(struct kobject *kobj, + const struct attribute *attr, const char *group); +void sysfs_unannotate_readlock(struct sysfs_dirent *sd); +void sysfs_reannotate_readlock(struct sysfs_dirent *sd); int __must_check sysfs_create_dir(struct kobject *kobj); void sysfs_remove_dir(struct kobject *kobj); @@ -188,6 +192,21 @@ static inline int sysfs_schedule_callbac return -ENOSYS; } +static inline struct sysfs_dirent *sysfs_find_dirent_for_attr( + struct kobject *kobj, + const struct attribute *attr, const char *group) +{ + return NULL; +} + +static inline void sysfs_unannotate_readlock(struct sysfs_dirent *sd) +{ +} + +static inline void sysfs_reannotate_readlock(struct sysfs_dirent *sd) +{ +} + static inline int sysfs_create_dir(struct kobject *kobj) { return 0; Index: usb-3.4/include/linux/device.h =================================================================== --- usb-3.4.orig/include/linux/device.h +++ usb-3.4/include/linux/device.h @@ -514,6 +514,11 @@ extern void device_remove_bin_file(struc const struct bin_attribute *attr); extern int device_schedule_callback_owner(struct device *dev, void (*func)(struct device *dev), struct module *owner); +extern void *device_start_attribute_infanticide( + struct device *dev, const struct device_attribute *attr, + const char *group); +extern void device_end_attribute_infanticide(void *cookie); + /* This is a macro to avoid include problems with THIS_MODULE */ #define device_schedule_callback(dev, func) \ Index: usb-3.4/fs/sysfs/dir.c =================================================================== --- usb-3.4.orig/fs/sysfs/dir.c +++ usb-3.4/fs/sysfs/dir.c @@ -133,6 +133,74 @@ static void sysfs_unlink_sibling(struct } /** + * sysfs_find_dirent_for_attr - find sysfs_dirent for a given kobject and attribute + * @kobj: kobject the attribute is attached to + * @attr: attribute whose sysfs_dirent should be found + * @group: name of the group containing @attr, or NULL + * + * Looks up and returns a pointer to the sysfs_dirent structure + * corresponding to the instance of @attr in @group attached to @kobj. + * Returns NULL if the sysfs_dirent cannot be found. + * + * Retains a reference to the returned sysfs_dirent. The caller must + * release this reference, as done by sysfs_reannotate_readlock() + * below. + */ +struct sysfs_dirent *sysfs_find_dirent_for_attr(struct kobject *kobj, + const struct attribute *attr, const char *group) +{ + struct sysfs_dirent *dir_sd; + struct sysfs_dirent *sd; + + if (group) + dir_sd = sysfs_get_dirent(kobj->sd, NULL, group); + else + dir_sd = sysfs_get(kobj->sd); + if (!dir_sd) + return NULL; + + sd = sysfs_get_dirent(dir_sd, NULL, attr->name); + sysfs_put(dir_sd); + return sd; +} + +/** + * sysfs_unannotate_readlock - drop the lockdep annotation for an attribute + * @sd: sysfs_dirent for the attribute whose lock annotation is dropped + * + * Lockdep is not able to cope with the tree-structured locks used + * by the device core. It considers all the s_active lock associated + * a particular attribute to be the same. This causes a false + * positive report when a method for an attribute attached to a device + * wants to remove the same attribute attached to a child device. + * + * To avoid such false positives, the method can call this routine to + * drop the lockdep annotation indicating that the attribute's dirent is + * currently locked, remove the child's attribute, and then reacquire + * the annotation. + */ +void sysfs_unannotate_readlock(struct sysfs_dirent *sd) +{ + rwsem_release(&sd->dep_map, 1, _RET_IP_); +} + +/** + * sysfs_reannotate_readlock - reacquire the lockdep annotation for an attribute + * @sd: sysfs_dirent for the attribute whose lock annotation is reacquired + * + * After an attribute method has called sysfs_unannotate_readlock() + * and removed the attributes from various child devices, it should + * call this routine to reacquire the lockdep annotation (and also to + * drop the reference to @sd obtained by + * sysfs_get_dirent_for_attribute()) before returning. + */ +void sysfs_reannotate_readlock(struct sysfs_dirent *sd) +{ + rwsem_acquire_read(&sd->dep_map, 0, 1, _RET_IP_); + sysfs_put(sd); +} + +/** * sysfs_get_active - get an active reference to sysfs_dirent * @sd: sysfs_dirent to get an active reference to * Index: usb-3.4/drivers/base/core.c =================================================================== --- usb-3.4.orig/drivers/base/core.c +++ usb-3.4/drivers/base/core.c @@ -609,6 +609,48 @@ int device_schedule_callback_owner(struc } EXPORT_SYMBOL_GPL(device_schedule_callback_owner); +/** + * device_start_attribute_infanticide - prepare to remove child attributes + * @dev: device passed to the calling method + * @attr: attribute passed to the calling method + * @group: name of the group containing @attr, or NULL + * + * To avoid false positive reports from lockdep, this routine should be + * called by an attribute method that wants to remove the same attribute + * from a descendant of the device for which it was called (an + * "infanticidal" attribute). + * + * The returned value should be treated as an opaque cookie and passed to + * device_end_attribute_infanticide() below after the descendant + * attributes have been removed. + */ +void *device_start_attribute_infanticide( + struct device *dev, const struct device_attribute *attr, + const char *group) +{ + struct sysfs_dirent *sd; + + sd = sysfs_find_dirent_for_attr(&dev->kobj, &attr->attr, group); + if (sd) + sysfs_unannotate_readlock(sd); + return sd; +} +EXPORT_SYMBOL_GPL(device_start_attribute_infanticide); + +/** + * device_end_attribute_infanticide - finish removing child attributes + * @cookie: value returned by device_start_attribute_infanticide() above + * + * This routine should be called by attribute methods after they have + * finished removing the corresponding attributes from descendant devices. + */ +void device_end_attribute_infanticide(void *cookie) +{ + if (cookie) + sysfs_reannotate_readlock(cookie); +} +EXPORT_SYMBOL_GPL(device_end_attribute_infanticide); + static void klist_children_get(struct klist_node *n) { struct device_private *p = to_device_private_parent(n); Index: usb-3.4/drivers/usb/core/sysfs.c =================================================================== --- usb-3.4.orig/drivers/usb/core/sysfs.c +++ usb-3.4/drivers/usb/core/sysfs.c @@ -64,11 +64,14 @@ set_bConfigurationValue(struct device *d { struct usb_device *udev = to_usb_device(dev); int config, value; + void *cookie; if (sscanf(buf, "%d", &config) != 1 || config < -1 || config > 255) return -EINVAL; usb_lock_device(udev); + cookie = device_start_attribute_infanticide(dev, attr, NULL); value = usb_set_configuration(udev, config); + device_end_attribute_infanticide(cookie); usb_unlock_device(udev); return (value < 0) ? value : count; } @@ -588,10 +591,15 @@ static ssize_t usb_dev_authorized_store( result = sscanf(buf, "%u\n", &val); if (result != 1) result = -EINVAL; - else if (val == 0) + else if (val == 0) { + void *cookie; + + cookie = device_start_attribute_infanticide(dev, attr, NULL); result = usb_deauthorize_device(usb_dev); - else + device_end_attribute_infanticide(cookie); + } else { result = usb_authorize_device(usb_dev); + } return result < 0? result : size; } @@ -605,12 +613,15 @@ static ssize_t usb_remove_store(struct d { struct usb_device *udev = to_usb_device(dev); int rc = 0; + void *cookie; usb_lock_device(udev); if (udev->state != USB_STATE_NOTATTACHED) { /* To avoid races, first unconfigure and then remove */ + cookie = device_start_attribute_infanticide(dev, attr, NULL); usb_set_configuration(udev, -1); + device_end_attribute_infanticide(cookie); rc = usb_remove_device(udev); } if (rc == 0) ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-05-04 19:08 ` Alan Stern @ 2012-05-07 19:46 ` Tejun Heo 2012-05-07 21:51 ` Alan Stern 0 siblings, 1 reply; 24+ messages in thread From: Tejun Heo @ 2012-05-07 19:46 UTC (permalink / raw) To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list Hello, Alan. On Fri, May 04, 2012 at 03:08:53PM -0400, Alan Stern wrote: > @@ -588,10 +591,15 @@ static ssize_t usb_dev_authorized_store( > result = sscanf(buf, "%u\n", &val); > if (result != 1) > result = -EINVAL; > - else if (val == 0) > + else if (val == 0) { > + void *cookie; > + > + cookie = device_start_attribute_infanticide(dev, attr, NULL); > result = usb_deauthorize_device(usb_dev); > - else > + device_end_attribute_infanticide(cookie); > + } else { > result = usb_authorize_device(usb_dev); > + } I *think* it looks way too huge as lockdep workaround. We're adding a whole separate lookup interface for this. If looking up afterwards is difficult, can't we get away with adding a field in struct attribute? Thanks. -- tejun ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-05-07 19:46 ` Tejun Heo @ 2012-05-07 21:51 ` Alan Stern 2012-05-07 21:55 ` Tejun Heo 0 siblings, 1 reply; 24+ messages in thread From: Alan Stern @ 2012-05-07 21:51 UTC (permalink / raw) To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Mon, 7 May 2012, Tejun Heo wrote: > Hello, Alan. Hi. > On Fri, May 04, 2012 at 03:08:53PM -0400, Alan Stern wrote: > > @@ -588,10 +591,15 @@ static ssize_t usb_dev_authorized_store( > > result = sscanf(buf, "%u\n", &val); > > if (result != 1) > > result = -EINVAL; > > - else if (val == 0) > > + else if (val == 0) { > > + void *cookie; > > + > > + cookie = device_start_attribute_infanticide(dev, attr, NULL); > > result = usb_deauthorize_device(usb_dev); > > - else > > + device_end_attribute_infanticide(cookie); > > + } else { > > result = usb_authorize_device(usb_dev); > > + } > > I *think* it looks way too huge as lockdep workaround. We're adding a Well, it's not _terribly_ huge. As far as the driver is concerned, it's just a start and an end call. The lookup interface isn't even EXPORTed, because its only user is the device core. > whole separate lookup interface for this. If looking up afterwards is > difficult, can't we get away with adding a field in struct attribute? You mean, an "ignore this attribute for lockdep purposes" flag? Yes, that would work just as well. I guess in the end it's a question of balance. Which has more overhead, adding a few function calls here and there, or adding a new flags field to every struct attribute? Alan Stern ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-05-07 21:51 ` Alan Stern @ 2012-05-07 21:55 ` Tejun Heo 2012-05-08 18:53 ` Alan Stern 0 siblings, 1 reply; 24+ messages in thread From: Tejun Heo @ 2012-05-07 21:55 UTC (permalink / raw) To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Mon, May 07, 2012 at 05:51:52PM -0400, Alan Stern wrote: > I guess in the end it's a question of balance. Which has more > overhead, adding a few function calls here and there, or adding a new > flags field to every struct attribute? Yes, and there are different types of overheads. I'm happy to trade some runtime memory overhead under debugging mode for lower code complexity. Lock proving is pretty expensive anyway. I don't think there's much point in trying to optimize some bytes from struct attributes. Thanks. -- tejun ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-05-07 21:55 ` Tejun Heo @ 2012-05-08 18:53 ` Alan Stern 2012-05-09 17:41 ` Tejun Heo 0 siblings, 1 reply; 24+ messages in thread From: Alan Stern @ 2012-05-08 18:53 UTC (permalink / raw) To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Mon, 7 May 2012, Tejun Heo wrote: > On Mon, May 07, 2012 at 05:51:52PM -0400, Alan Stern wrote: > > I guess in the end it's a question of balance. Which has more > > overhead, adding a few function calls here and there, or adding a new > > flags field to every struct attribute? > > Yes, and there are different types of overheads. I'm happy to trade > some runtime memory overhead under debugging mode for lower code > complexity. Lock proving is pretty expensive anyway. I don't think > there's much point in trying to optimize some bytes from struct > attributes. Okay, then what do you think about this approach? It does seem smaller and simpler than the previous attempt. And I did try to avoid unnecessary bloat; if lockdep isn't being used then the extra attribute flag isn't present. Alan Stern Index: usb-3.4/include/linux/sysfs.h =================================================================== --- usb-3.4.orig/include/linux/sysfs.h +++ usb-3.4/include/linux/sysfs.h @@ -27,6 +27,7 @@ struct attribute { const char *name; umode_t mode; #ifdef CONFIG_DEBUG_LOCK_ALLOC + bool ignore_lockdep:1; struct lock_class_key *key; struct lock_class_key skey; #endif @@ -80,6 +81,17 @@ struct attribute_group { #define __ATTR_NULL { .attr = { .name = NULL } } +#ifdef CONFIG_DEBUG_LOCK_ALLOC +#define __ATTR_IGNORE_LOCKDEP(_name,_mode,_show,_store) { \ + .attr = {.name = __stringify(_name), .mode = _mode, \ + .ignore_lockdep = true }, \ + .show = _show, \ + .store = _store, \ +} +#else +#define __ATTR_IGNORE_LOCKDEP __ATTR +#endif + #define attr_name(_attr) (_attr).attr.name struct file; Index: usb-3.4/fs/sysfs/dir.c =================================================================== --- usb-3.4.orig/fs/sysfs/dir.c +++ usb-3.4/fs/sysfs/dir.c @@ -132,6 +132,24 @@ static void sysfs_unlink_sibling(struct rb_erase(&sd->s_rb, &sd->s_parent->s_dir.children); } +#ifdef CONFIG_DEBUG_LOCK_ALLOC + +/* Test for attributes that want to ignore lockdep for read-locking */ +static bool ignore_lockdep(struct sysfs_dirent *sd) +{ + return sysfs_type(sd) == SYSFS_KOBJ_ATTR && + sd->s_attr.attr->ignore_lockdep; +} + +#else + +static inline bool ignore_lockdep(struct sysfs_dirent *sd) +{ + return true; +} + +#endif + /** * sysfs_get_active - get an active reference to sysfs_dirent * @sd: sysfs_dirent to get an active reference to @@ -155,15 +173,17 @@ struct sysfs_dirent *sysfs_get_active(st return NULL; t = atomic_cmpxchg(&sd->s_active, v, v + 1); - if (likely(t == v)) { - rwsem_acquire_read(&sd->dep_map, 0, 1, _RET_IP_); - return sd; - } + if (likely(t == v)) + break; if (t < 0) return NULL; cpu_relax(); } + + if (likely(!ignore_lockdep(sd))) + rwsem_acquire_read(&sd->dep_map, 0, 1, _RET_IP_); + return sd; } /** @@ -180,7 +200,8 @@ void sysfs_put_active(struct sysfs_diren if (unlikely(!sd)) return; - rwsem_release(&sd->dep_map, 1, _RET_IP_); + if (likely(!ignore_lockdep(sd))) + rwsem_release(&sd->dep_map, 1, _RET_IP_); v = atomic_dec_return(&sd->s_active); if (likely(v != SD_DEACTIVATED_BIAS)) return; Index: usb-3.4/include/linux/device.h =================================================================== --- usb-3.4.orig/include/linux/device.h +++ usb-3.4/include/linux/device.h @@ -503,6 +503,9 @@ ssize_t device_store_int(struct device * #define DEVICE_INT_ATTR(_name, _mode, _var) \ struct dev_ext_attribute dev_attr_##_name = \ { __ATTR(_name, _mode, device_show_ulong, device_store_ulong), &(_var) } +#define DEVICE_ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store) \ + struct device_attribute dev_attr_##_name = \ + __ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store) extern int device_create_file(struct device *device, const struct device_attribute *entry); Index: usb-3.4/drivers/usb/core/sysfs.c =================================================================== --- usb-3.4.orig/drivers/usb/core/sysfs.c +++ usb-3.4/drivers/usb/core/sysfs.c @@ -73,7 +73,7 @@ set_bConfigurationValue(struct device *d return (value < 0) ? value : count; } -static DEVICE_ATTR(bConfigurationValue, S_IRUGO | S_IWUSR, +static DEVICE_ATTR_IGNORE_LOCKDEP(bConfigurationValue, S_IRUGO | S_IWUSR, show_bConfigurationValue, set_bConfigurationValue); /* String fields */ @@ -595,7 +595,7 @@ static ssize_t usb_dev_authorized_store( return result < 0? result : size; } -static DEVICE_ATTR(authorized, 0644, +static DEVICE_ATTR_IGNORE_LOCKDEP(authorized, 0644, usb_dev_authorized_show, usb_dev_authorized_store); /* "Safely remove a device" */ @@ -618,7 +618,7 @@ static ssize_t usb_remove_store(struct d usb_unlock_device(udev); return rc; } -static DEVICE_ATTR(remove, 0200, NULL, usb_remove_store); +static DEVICE_ATTR_IGNORE_LOCKDEP(remove, 0200, NULL, usb_remove_store); static struct attribute *dev_attrs[] = { ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-05-08 18:53 ` Alan Stern @ 2012-05-09 17:41 ` Tejun Heo 2012-05-09 17:47 ` Alan Stern 0 siblings, 1 reply; 24+ messages in thread From: Tejun Heo @ 2012-05-09 17:41 UTC (permalink / raw) To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list Hello, On Tue, May 08, 2012 at 02:53:11PM -0400, Alan Stern wrote: > On Mon, 7 May 2012, Tejun Heo wrote: > > > On Mon, May 07, 2012 at 05:51:52PM -0400, Alan Stern wrote: > > > I guess in the end it's a question of balance. Which has more > > > overhead, adding a few function calls here and there, or adding a new > > > flags field to every struct attribute? > > > > Yes, and there are different types of overheads. I'm happy to trade > > some runtime memory overhead under debugging mode for lower code > > complexity. Lock proving is pretty expensive anyway. I don't think > > there's much point in trying to optimize some bytes from struct > > attributes. > > Okay, then what do you think about this approach? It does seem smaller > and simpler than the previous attempt. > > And I did try to avoid unnecessary bloat; if lockdep isn't being used > then the extra attribute flag isn't present. Yeap, looks good to me. Thanks. -- tejun ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-05-09 17:41 ` Tejun Heo @ 2012-05-09 17:47 ` Alan Stern 2012-05-09 17:48 ` Tejun Heo 0 siblings, 1 reply; 24+ messages in thread From: Alan Stern @ 2012-05-09 17:47 UTC (permalink / raw) To: Tejun Heo; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Wed, 9 May 2012, Tejun Heo wrote: > Hello, > > On Tue, May 08, 2012 at 02:53:11PM -0400, Alan Stern wrote: > > On Mon, 7 May 2012, Tejun Heo wrote: > > > > > On Mon, May 07, 2012 at 05:51:52PM -0400, Alan Stern wrote: > > > > I guess in the end it's a question of balance. Which has more > > > > overhead, adding a few function calls here and there, or adding a new > > > > flags field to every struct attribute? > > > > > > Yes, and there are different types of overheads. I'm happy to trade > > > some runtime memory overhead under debugging mode for lower code > > > complexity. Lock proving is pretty expensive anyway. I don't think > > > there's much point in trying to optimize some bytes from struct > > > attributes. > > > > Okay, then what do you think about this approach? It does seem smaller > > and simpler than the previous attempt. > > > > And I did try to avoid unnecessary bloat; if lockdep isn't being used > > then the extra attribute flag isn't present. > > Yeap, looks good to me. Unless there are any objections from Eric or Peter in the next few days, I'll submit it. Can I add your Acked-by? Alan Stern ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-05-09 17:47 ` Alan Stern @ 2012-05-09 17:48 ` Tejun Heo 0 siblings, 0 replies; 24+ messages in thread From: Tejun Heo @ 2012-05-09 17:48 UTC (permalink / raw) To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Wed, May 09, 2012 at 01:47:34PM -0400, Alan Stern wrote: > On Wed, 9 May 2012, Tejun Heo wrote: > > > Hello, > > > > On Tue, May 08, 2012 at 02:53:11PM -0400, Alan Stern wrote: > > > On Mon, 7 May 2012, Tejun Heo wrote: > > > > > > > On Mon, May 07, 2012 at 05:51:52PM -0400, Alan Stern wrote: > > > > > I guess in the end it's a question of balance. Which has more > > > > > overhead, adding a few function calls here and there, or adding a new > > > > > flags field to every struct attribute? > > > > > > > > Yes, and there are different types of overheads. I'm happy to trade > > > > some runtime memory overhead under debugging mode for lower code > > > > complexity. Lock proving is pretty expensive anyway. I don't think > > > > there's much point in trying to optimize some bytes from struct > > > > attributes. > > > > > > Okay, then what do you think about this approach? It does seem smaller > > > and simpler than the previous attempt. > > > > > > And I did try to avoid unnecessary bloat; if lockdep isn't being used > > > then the extra attribute flag isn't present. > > > > Yeap, looks good to me. > > Unless there are any objections from Eric or Peter in the next few > days, I'll submit it. Can I add your Acked-by? Sure. Thanks for the persistence. -- tejun ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-27 15:57 ` Alan Stern 2012-04-27 16:09 ` Tejun Heo @ 2012-04-27 16:27 ` Eric W. Biederman 2012-04-27 18:27 ` Alan Stern 1 sibling, 1 reply; 24+ messages in thread From: Eric W. Biederman @ 2012-04-27 16:27 UTC (permalink / raw) To: Alan Stern; +Cc: Tejun Heo, Peter Zijlstra, Kernel development list Alan Stern <stern@rowland.harvard.edu> writes: > On Thu, 26 Apr 2012, Tejun Heo wrote: > >> On Thu, Apr 26, 2012 at 02:14:30PM -0400, Alan Stern wrote: >> > > > Hmmm.... This happens because, by default, sysfs_dirents for the same >> > > > attr share the same lockdep key. This happens from >> > > > sysfs_dirent_init_lockdep(). Hmm.... we can, >> > > > >> > > > * Somehow assign different keys to sysfs_dirents for the specific >> > > > attr. Use array of attrs indexed by bus depth? >> > > >> > > Possible with sysfs_attr_init but pretty ugly. Especially since it >> > > sounds like this is a situation that does not presuppose a maximum >> > > depth. I do remember that the lockdep keys must be statically allocated >> > > which makes this a challenge. >> >> The depth is limited by USB spec. >> >> > I agree; this doesn't seem like a good approach. >> >> It sure isn't pretty but probably best matches the situation in the >> sense that lockdep would actually be able to know about the nesting >> going on. > > By the way, do you know why attribute structures allow for dynamic keys > as well as static keys? I see dynamic keys are used by attribute > containers, but I don't understand why. Some attributes are allocated dynamically, but they need static keys. It is a lockdep requirement. So for dynamically allocated attributes we have to smack them so they have static keys. Almost all sysfs attributes are declared statically so the optimization of taking the key from the attribute structure saves modifying a lot of code and makes the code that exists easy to get correct. Dynamically allocated attributes break in a pretty obvious way when you try to use them with lockdep enabled so I think we have a reasonable solution. >> > Another idea is to have A's method temporarily drop the sysfs readlock. >> > Of course that would put the onus on the USB core of guaranteeing that >> > A cannot be removed while this happens, but we can handle that. >> >> Yeah, that's an easier way out. Please make it a proper sysfs API >> call tho so that people working on sysfs later can know of the special >> case. > > I will. > > Would it be better to release just the lockdep annotation while > continuing to hold the actual lock, or to really drop the lock? We can't really drop the ``lock''. That would imply not waiting for all of the methods using a sysfs attributes not to finish before we remove a sysfs attribute. Kaboom! Probably what would be better would figure out how to sneak in something that for this file we tell lockdep to ignore it like you do the device locks. As you do for the device layer locking. I don't like the choices available at this junction. I have seen some nasty ABBA deadlocks with sysfs, and anything that makes those easier to see. One thing that looks promising after reading lwn yesterday is what Al and Oleg are doing with fput and a per task work queue that runs before we return to user space. If you could use that we could have our cake and eat it too. You could schedule the work in a work queue but you could also be guaranteed that there are not any locking problems. Do you think you could investigate that possibility? Eric ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-27 16:27 ` Eric W. Biederman @ 2012-04-27 18:27 ` Alan Stern 2012-04-27 20:17 ` Tejun Heo 0 siblings, 1 reply; 24+ messages in thread From: Alan Stern @ 2012-04-27 18:27 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Tejun Heo, Peter Zijlstra, Kernel development list On Fri, 27 Apr 2012, Eric W. Biederman wrote: > >> > Another idea is to have A's method temporarily drop the sysfs readlock. > >> > Of course that would put the onus on the USB core of guaranteeing that > >> > A cannot be removed while this happens, but we can handle that. > >> > >> Yeah, that's an easier way out. Please make it a proper sysfs API > >> call tho so that people working on sysfs later can know of the special > >> case. > > > > I will. > > > > Would it be better to release just the lockdep annotation while > > continuing to hold the actual lock, or to really drop the lock? > > We can't really drop the ``lock''. That would imply not waiting for > all of the methods using a sysfs attributes not to finish before we > remove a sysfs attribute. Kaboom! That's why I wrote above "... would put the onus on the USB core of guaranteeing that A cannot be removed while this happens". But regardless, we can keep the lock while dropping the annotation. > Probably what would be better would figure out how to sneak in something > that for this file we tell lockdep to ignore it like you do the device > locks. As you do for the device layer locking. Dropping the annotation has pretty much that effect, right? I don't want lockdep to ignore the locks entirely, because we still want to catch attributes that unregister themselves (as opposed to unregistering their counterparts in child devices). > I don't like the choices available at this junction. I have seen some > nasty ABBA deadlocks with sysfs, and anything that makes those easier to > see. > > One thing that looks promising after reading lwn yesterday is what Al > and Oleg are doing with fput and a per task work queue that runs before > we return to user space. If you could use that we could have our > cake and eat it too. You could schedule the work in a work queue but > you could also be guaranteed that there are not any locking problems. > > Do you think you could investigate that possibility? Yes, I think that would work. It would require a little more code and some errors would not get reported correctly (those that occur during the callback). (Incidentally, the sysfs_workqueue used in sysfs_schedule_callback() could also be replaced by the per-task work queue.) Which do you prefer: temporarily dropping the lockdep annotation, or deferring the work to the per-task work queue? Note that these "flush before returning to userspace" things aren't in the kernel yet. Alan Stern ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-27 18:27 ` Alan Stern @ 2012-04-27 20:17 ` Tejun Heo 2012-04-27 21:09 ` Eric W. Biederman 0 siblings, 1 reply; 24+ messages in thread From: Tejun Heo @ 2012-04-27 20:17 UTC (permalink / raw) To: Alan Stern; +Cc: Eric W. Biederman, Peter Zijlstra, Kernel development list On Fri, Apr 27, 2012 at 02:27:26PM -0400, Alan Stern wrote: > Which do you prefer: temporarily dropping the lockdep annotation, or > deferring the work to the per-task work queue? Note that these "flush > before returning to userspace" things aren't in the kernel yet. It's about lockdep. Let's not spill out to other areas unnecessarily. I'd much prefer just working around lockdep annotation. Thanks. -- tejun ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-27 20:17 ` Tejun Heo @ 2012-04-27 21:09 ` Eric W. Biederman 2012-04-27 21:16 ` Tejun Heo 2012-04-29 2:00 ` Alan Stern 0 siblings, 2 replies; 24+ messages in thread From: Eric W. Biederman @ 2012-04-27 21:09 UTC (permalink / raw) To: Tejun Heo; +Cc: Alan Stern, Peter Zijlstra, Kernel development list Tejun Heo <tj@kernel.org> writes: > On Fri, Apr 27, 2012 at 02:27:26PM -0400, Alan Stern wrote: >> Which do you prefer: temporarily dropping the lockdep annotation, or >> deferring the work to the per-task work queue? Note that these "flush >> before returning to userspace" things aren't in the kernel yet. > > It's about lockdep. Let's not spill out to other areas unnecessarily. > I'd much prefer just working around lockdep annotation. It is also about a locking scheme that makes analysis hard. Doing something so we don't have false lockdep warnings is good so people can concentrate on things that are clearly problems. However it would be nice if we could sort out the locking so that it isn't so tricky that neither lockdep nor sparse can figure it out. I have the sneaking suspicion that idioms that tangle up our automatic tools are also idioms that are likely to result in maintenance problems at some point. Another possibility to to look at the situation and realize that pci has a maxium depth of 256 (bus numbers). And that usb also has a maxium depth of I believe 256 ( If I read it right usb hubs are transparent to usb enumeration so the maximum depth is the maximum number of usb ids and I think the usb id is a 8 bit number). I don't think anything else even nests so deeply. So it may be reasonable to declare an array of 256 or perhaps 1024 lockdep keys and limit the device tree when lockdep is enabled to 1024 layers deep. At which point we are at a point where lockdep can actually analyze the behavior. I don't mind the attitude we are clever careful programmers we can handle the complexity and we can get away without the tool help us, but I would much rather see the attitude that we are clever careful programmers and we can figure out how to make the tool help us instead of just ignoring it. Eric ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-27 21:09 ` Eric W. Biederman @ 2012-04-27 21:16 ` Tejun Heo 2012-04-29 2:00 ` Alan Stern 1 sibling, 0 replies; 24+ messages in thread From: Tejun Heo @ 2012-04-27 21:16 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Alan Stern, Peter Zijlstra, Kernel development list On Fri, Apr 27, 2012 at 02:09:22PM -0700, Eric W. Biederman wrote: > I don't mind the attitude we are clever careful programmers we can > handle the complexity and we can get away without the tool help us, but > I would much rather see the attitude that we are clever careful > programmers and we can figure out how to make the tool help us instead > of just ignoring it. I'm okay with using static array of keys so that each level maps to separate key or just calling it a special case and ignoring it, but I think it's quite silly to use an async mechanism just to avoid lockdep warning. Things like that tend to make things obscure as people generally don't expect lockdep annotation to dictate overall behaviors. It is an annotation problem. Let's keep it that way. Thanks. -- tejun ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-27 21:09 ` Eric W. Biederman 2012-04-27 21:16 ` Tejun Heo @ 2012-04-29 2:00 ` Alan Stern 2012-04-29 2:35 ` Eric W. Biederman 1 sibling, 1 reply; 24+ messages in thread From: Alan Stern @ 2012-04-29 2:00 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Tejun Heo, Peter Zijlstra, Kernel development list On Fri, 27 Apr 2012, Eric W. Biederman wrote: > However it would be nice if we could sort out the locking so that it > isn't so tricky that neither lockdep nor sparse can figure it out. > > I have the sneaking suspicion that idioms that tangle up our automatic > tools are also idioms that are likely to result in maintenance problems > at some point. That may well be true, but it won't be easy to avoid them. At the least, it would require a careful analysis of the device tree usage. > Another possibility to to look at the situation and realize that pci has > a maxium depth of 256 (bus numbers). And that usb also has a maxium > depth of I believe 256 ( If I read it right usb hubs are transparent to > usb enumeration so the maximum depth is the maximum number of usb ids > and I think the usb id is a 8 bit number). USB has a maximum depth of 7 or so. It's limited by the number of hubs allowed on the path between the host and a device. > I don't think anything else > even nests so deeply. So it may be reasonable to declare an array of > 256 or perhaps 1024 lockdep keys and limit the device tree when lockdep > is enabled to 1024 layers deep. > > At which point we are at a point where lockdep can actually analyze the > behavior. Unfortunately, we are not. As I mentioned earlier, the device "tree" is really a forest. Locks are sometimes acquired in orders that are not strictly downward. > I don't mind the attitude we are clever careful programmers we can > handle the complexity and we can get away without the tool help us, but > I would much rather see the attitude that we are clever careful > programmers and we can figure out how to make the tool help us instead > of just ignoring it. I'm certainly open to suggestions as to how to improve the situation, but the simple-minded "keep track of the depth in the tree" approach doesn't work. Alan Stern ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Lockdep false positive in sysfs 2012-04-29 2:00 ` Alan Stern @ 2012-04-29 2:35 ` Eric W. Biederman 0 siblings, 0 replies; 24+ messages in thread From: Eric W. Biederman @ 2012-04-29 2:35 UTC (permalink / raw) To: Alan Stern; +Cc: Tejun Heo, Peter Zijlstra, Kernel development list Alan Stern <stern@rowland.harvard.edu> writes: > On Fri, 27 Apr 2012, Eric W. Biederman wrote: > >> However it would be nice if we could sort out the locking so that it >> isn't so tricky that neither lockdep nor sparse can figure it out. >> >> I have the sneaking suspicion that idioms that tangle up our automatic >> tools are also idioms that are likely to result in maintenance problems >> at some point. > > That may well be true, but it won't be easy to avoid them. At the > least, it would require a careful analysis of the device tree usage. > >> Another possibility to to look at the situation and realize that pci has >> a maxium depth of 256 (bus numbers). And that usb also has a maxium >> depth of I believe 256 ( If I read it right usb hubs are transparent to >> usb enumeration so the maximum depth is the maximum number of usb ids >> and I think the usb id is a 8 bit number). > > USB has a maximum depth of 7 or so. It's limited by the number of hubs > allowed on the path between the host and a device. > >> I don't think anything else >> even nests so deeply. So it may be reasonable to declare an array of >> 256 or perhaps 1024 lockdep keys and limit the device tree when lockdep >> is enabled to 1024 layers deep. >> >> At which point we are at a point where lockdep can actually analyze the >> behavior. > > Unfortunately, we are not. As I mentioned earlier, the device "tree" > is really a forest. Locks are sometimes acquired in orders that are > not strictly downward. Then it sounds like for the device tree in general this is a limitation. For sysfs with your problem usb attribute this looks like a real solution. >> I don't mind the attitude we are clever careful programmers we can >> handle the complexity and we can get away without the tool help us, but >> I would much rather see the attitude that we are clever careful >> programmers and we can figure out how to make the tool help us instead >> of just ignoring it. > > I'm certainly open to suggestions as to how to improve the situation, > but the simple-minded "keep track of the depth in the tree" approach > doesn't work. For the general device tree perhaps. For the problem of your removal of your sysfs attribute the solution should be sufficient. I don't particularly like it because it will take a lot of testing to find any bugs that lockdep might reveal but at least you will have a fighting chance. I would like to look at the general problem of the device tree locking. Unfortunately I am tilting at enough other windmills right now that I can't do that problem justice. Eric ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2012-05-09 17:49 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-04-25 18:58 Lockdep false positive in sysfs Alan Stern 2012-04-25 21:59 ` Tejun Heo 2012-04-26 8:16 ` Eric W. Biederman 2012-04-26 18:14 ` Alan Stern 2012-04-26 22:17 ` Tejun Heo 2012-04-27 15:57 ` Alan Stern 2012-04-27 16:09 ` Tejun Heo 2012-05-03 21:30 ` Alan Stern 2012-05-04 16:52 ` Tejun Heo 2012-05-04 19:08 ` Alan Stern 2012-05-07 19:46 ` Tejun Heo 2012-05-07 21:51 ` Alan Stern 2012-05-07 21:55 ` Tejun Heo 2012-05-08 18:53 ` Alan Stern 2012-05-09 17:41 ` Tejun Heo 2012-05-09 17:47 ` Alan Stern 2012-05-09 17:48 ` Tejun Heo 2012-04-27 16:27 ` Eric W. Biederman 2012-04-27 18:27 ` Alan Stern 2012-04-27 20:17 ` Tejun Heo 2012-04-27 21:09 ` Eric W. Biederman 2012-04-27 21:16 ` Tejun Heo 2012-04-29 2:00 ` Alan Stern 2012-04-29 2:35 ` Eric W. Biederman
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.