linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.21-rc suspend regression: sysfs deadlock
@ 2007-03-10 20:44 Alan Stern
  0 siblings, 0 replies; 30+ messages in thread
From: Alan Stern @ 2007-03-10 20:44 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Linus Torvalds, Dmitry Torokhov, Maneesh Soni, gregkh,
	James Bottomley, Kernel development list

[For the start of this thread, see 
<http://marc.theaimsgroup.com/?l=linux-kernel&m=117320893726621&w=2>.]

On Wed, 7 Mar 2007, Linus Torvalds wrote:

> So you just pointed to *another* data structure that apparently violates 
> the "you MUST use refcounting" rule.
> 
> What is it with you people? It's really simple. Data structures must be 
> refcounted if you can reach them two different ways.
> 
> If you don't use refcounting, then you'd better make sure that the data 
> can be reached only one way (for example, by *not* exposing it for sysfs).
> 
> It really *is* that simple. Read the CodingStyle rules.

Linus's analysis is correct as far as it goes, but it misses some very 
important points.  The _real_ problem here, which nobody has pointed out 
so far, is not device removal or driver unloading.  It is driver 
unbinding -- with its consequent issue of access rights.

When a driver is unbound from a device, when should the driver stop trying 
to access that device?  The obvious answer is that it must stop before its 
release() method returns.  Otherwise the device might get bound to 
another driver and we would have both drivers trying to talk to it at the 
same time.

In other words, when a driver unbinds from a device, it loses its right to
access that device.  Same goes for any device-related data structures that
weren't created by the driver itself.  When you realize this, it becomes
obvious that the driver faces a synchronization problem.  All its entry
points must be synchronized with release(), to avoid races.

So there actually are two things a driver has to worry about:

	The lifetime of its private data structures (which can be solved
	using refcounts as Linus advocated);

	The race between release() and other activities (which cannot
	be solved by refcounts but needs a true synchronization technique,
	such as a mutex).

No doubt some of this sounds familiar; the race between open() and
disconnect() for char device drivers is one we have faced many times and
not always solved perfectly.  Also note that this is a fundamental
problem, affecting many facilities in addition to sysfs.


One way to solve these problems is to put all the responsibility on the 
driver.  Make it refcount its data structures and use mutexes.  This is 
not very attractive for several reasons:

	_Lots_ of drivers are affected.  Pretty much any driver which
	registers a char device or a sysfs attribute file.

	_Lots_ of code would need to be changed, adding considerable
	bloat.  Every show()/store() method would need to acquire a mutex,
	and many would need to be passed an additional argument, requiring
	a change in the sysfs API.  (I can explain why in a follow-up 
	email if anyone is interested.)

	Most importantly, doing all the refcounting and mutual exclusion
	correctly is quite hard.  It's amazingly easy to make mistakes
	in these areas.  The chances of getting it right while changing
	multiple functions in every single driver are infinitesimal.

Another approach is to put all the responsibility on the core subsystems
that handle driver registration.  They should enforce rigidly two
principles: "No driver callbacks occur after unregistration" and its
prerequisite, "Unregistration is mutually exclusive with driver
callbacks".  (This is exactly what Oliver's original patch did for sysfs.)

	The number of core subsystems affected is much smaller than the
	total number of drivers.  Sysfs, debugfs, the char device
	subsystem, maybe a few others.

	Drivers would no longer have to worry about doing their own
	synchronization or refcounts.  It would be guaranteed that a
	private data structure would never be accessed from sysfs after
	device_remove_file() returned, so the structure could safely and
	easily be deallocated as part of release().

At the expense of complicating a few central subsystems, we could simplify
a lot of drivers.  I think this is a worthwhile tradeoff.

It does have a small disadvantage; it means that an entry point would
deadlock if it tried to unregister itself.  (The example which started
this whole thread was sdev_store_delete() in the SCSI core.  Writing to
that attribute unregisters the device to which it belongs.)  Clearly the
actual unregistration would have to performed separately in a workqueue.  
I think the number of places where this occurs is pretty small.


It's true that this approach goes against the general philosophy used
elsewhere in the kernel.  Refcounting without synchronization is the
general rule.

However unbinding is a special case.  Normally with refcounting, it
doesn't matter when a driver tries to read or write a data structure.  So
long as the driver still holds a reference, the data will be there and the
access will be okay.

But not with unbinding!  After unbinding, the data will still be there but 
it might be owned by another driver.  Even worse, instead of just 
accessing data the code might try to access the device.

The basic assumption behind the refcounting approach is that a resource 
will be used for one purpose and then discarded, so accessing it will 
always be okay.  Driver unbinding violates this assumption; the devices 
and data structures that a driver binds to can be bound, unbound, and 
rebound multiple times.  Simple refcounting isn't sufficient to handle the 
situation.


In short, I think Oliver's original patch should be reinstated.  
(sdev_store_delete() can easily be rewritten to use a workqueue.)  Not
only that, it should be exanded upon.  For instance, it handles regular
sysfs files but it ignores binary files -- a clear oversight.  Debugfs and
the char device subsystem should be modified similarly.  Maybe also
procfs, and perhaps others.

Alan Stern


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 16:29                 ` Hugh Dickins
@ 2007-03-15 16:51                   ` Linus Torvalds
  0 siblings, 0 replies; 30+ messages in thread
From: Linus Torvalds @ 2007-03-15 16:51 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Stern, Cornelia Huck, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list



On Thu, 15 Mar 2007, Hugh Dickins wrote:
> 
> sysfs_access_in_other_task() left me wondering what this "other" task
> was, and what kind of "access" it's trying to get - or is the calling
> task the other, and it's trying to access something it wouldn't
> directly have access to?

For naming clashes, I'd suggest:

 - try to name according to *why* something is done, not necessarily what 
   it does.

   For example, is it really in "another task"? Maybe it's just an 
   on-demand thread of the same task?  Do you actually care how the 
   deferred work is done?

 - avoid being vague. I agree with not liking the name much, and the 
   "other" thing bothers me. Like Hugh, it makes me ask "_What_ other 
   task?"

So I would suggest not concentrating on some implementation issue, but on 
the reason why you need it in the first place. Namely that you want to 
defer the actual action to avoid deadlock due to recursive locking. So 
that "why do I actually do this" thing implies something like 
"sysfs_store_async()" or "sysfs_store_deferred()" or maybe actually 
concentrate on the locking angle and say something like 
"sysfs_store_needs_to_reacquire_lock()".

(That last one wasn't really serious - it's too long and cumbersome, but 
it's an example of not caring _how_ you do it, just abotu what you want 
done).

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 14:27               ` Alan Stern
  2007-03-15 15:32                 ` Cornelia Huck
@ 2007-03-15 16:29                 ` Hugh Dickins
  2007-03-15 16:51                   ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Hugh Dickins @ 2007-03-15 16:29 UTC (permalink / raw)
  To: Alan Stern
  Cc: Cornelia Huck, Linus Torvalds, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Alan Stern wrote:
> 
> Personally I don't understand what was wrong with my name.  What's weird 
> or unintuitive about doing something in a different task's context?

The only thing wrong with sysfs_do_something_in_a_different_task_context()
is the length of the name.  "do", that's good, much better than "access".

sysfs_access_in_other_task() left me wondering what this "other" task
was, and what kind of "access" it's trying to get - or is the calling
task the other, and it's trying to access something it wouldn't
directly have access to?

> 
> Dmitry's suggestion is slightly inappropriate because the function doesn't
> take a workstruct as an argument and it isn't itself a workqueue callback.  

True, though since he's saying "work" rather than "workstruct",
I was okay with that: it's a sysfs wrapper to schedule_work().

> 
> Would people be happier with sysfs_schedule_callback() and
> device_schedule_callback()?  At least the functions do take a callback 
> pointer as an argument, even though they aren't callbacks themselves.

A lot happier than with sysfs_access_in_other_task() -
if you prefer this to Dmitry's, it's okay by me.

Hugh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 14:27               ` Alan Stern
@ 2007-03-15 15:32                 ` Cornelia Huck
  2007-03-15 16:29                 ` Hugh Dickins
  1 sibling, 0 replies; 30+ messages in thread
From: Cornelia Huck @ 2007-03-15 15:32 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007 10:27:19 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> Fair enough.  One use of "delay" is in a comment you wrote; I'll change it 
> as well.

Fine with me.

> Would people be happier with sysfs_schedule_callback() and
> device_schedule_callback()?  At least the functions do take a callback 
> pointer as an argument, even though they aren't callbacks themselves.

Count one happy person here.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 10:27             ` Cornelia Huck
  2007-03-15 12:31               ` Hugh Dickins
@ 2007-03-15 14:27               ` Alan Stern
  2007-03-15 15:32                 ` Cornelia Huck
  2007-03-15 16:29                 ` Hugh Dickins
  1 sibling, 2 replies; 30+ messages in thread
From: Alan Stern @ 2007-03-15 14:27 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Cornelia Huck wrote:

> > > The naming seems a bit unintuitive, but I don't have a good
> > > alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?
> > 
> > sysfs_work_struct is too generic; other parts of sysfs might also want to
> > use workqueues for different purposes.
> 
> > I don't like calling it "delayed"-anything, because the operations aren't
> > necessarily delayed!  On an SMP system they might even execute before the
> > sysfs_access_in_other_task() call returns.  (Although the two examples we
> > have so far can't do that because of lock contention.)
> 
> Sure. But then you shouldn't refer to "delay" in the comments for the
> functions as well :)

Fair enough.  One use of "delay" is in a comment you wrote; I'll change it 
as well.

> > The major feature added here is that the work takes place in a different 
> > task's context, not that it is delayed.  Hence the choice of names.
> 
> Hm. Perhaps device_schedule_access()?

On Thu, 15 Mar 2007, Hugh Dickins wrote:

> It's really none of my business, I'm merely the reporter the
> deadlock being fixed, and I don't know my way around sysfs at all ...
> 
> ... but I have to say I share your discomfort with Alan's
> "sysfs_access_in_other_task" naming, it sounded very weird to me.
> 
> Quite apart from this mysterious "other task", I don't understand
> "access" either.
> 
> Perhaps "defer" would best capture the idea of another-task and
> maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?  

On Thu, 15 Mar 2007, Oliver Neukum wrote:

> But we do not wish to defer or delay anything.
> How about: sysfs_action_from_neutral_context  

On Thu, 15 Mar 2007, Dmitry Torokhov wrote:

> How about sysfs_schedule_work? That is what it does - schedules a work
> on a sysfs object and everyone here knows what schedule_work() does.  

On Thu, 15 Mar 2007, Hugh Dickins wrote:

> I'm ashamed to have suggested anything else: certainly gets my vote.

Personally I don't understand what was wrong with my name.  What's weird 
or unintuitive about doing something in a different task's context?

Dmitry's suggestion is slightly inappropriate because the function doesn't
take a workstruct as an argument and it isn't itself a workqueue callback.  

Would people be happier with sysfs_schedule_callback() and
device_schedule_callback()?  At least the functions do take a callback 
pointer as an argument, even though they aren't callbacks themselves.

Alan Stern


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 13:22                   ` Dmitry Torokhov
@ 2007-03-15 13:59                     ` Hugh Dickins
  0 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2007-03-15 13:59 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Oliver Neukum, Cornelia Huck, Alan Stern, Linus Torvalds,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Dmitry Torokhov wrote:
> 
> How about sysfs_schedule_work? That is what it does - schedules a work
> on a sysfs object and everyone here knows what schedule_work() does.

I'm ashamed to have suggested anything else: certainly gets my vote.

Hugh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 13:02                 ` Oliver Neukum
@ 2007-03-15 13:22                   ` Dmitry Torokhov
  2007-03-15 13:59                     ` Hugh Dickins
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Torokhov @ 2007-03-15 13:22 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hugh Dickins, Cornelia Huck, Alan Stern, Linus Torvalds,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On 3/15/07, Oliver Neukum <oneukum@suse.de> wrote:
> Am Donnerstag, 15. März 2007 13:31 schrieb Hugh Dickins:
> > Quite apart from this mysterious "other task", I don't understand
> > "access" either.
> >
> > Perhaps "defer" would best capture the idea of another-task and
> > maybe-delay? sysfs_defer_work(), struct sysfs_deferred_work?
>
> But we do not wish to defer or delay anything.
> How about: sysfs_action_from_neutral_context
>

How about sysfs_schedule_work? That is what it does - schedules a work
on a sysfs object and everyone here knows what schedule_work() does.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 12:31               ` Hugh Dickins
@ 2007-03-15 13:02                 ` Oliver Neukum
  2007-03-15 13:22                   ` Dmitry Torokhov
  0 siblings, 1 reply; 30+ messages in thread
From: Oliver Neukum @ 2007-03-15 13:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Cornelia Huck, Alan Stern, Linus Torvalds, Dmitry Torokhov,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

Am Donnerstag, 15. März 2007 13:31 schrieb Hugh Dickins:
> Quite apart from this mysterious "other task", I don't understand
> "access" either.
> 
> Perhaps "defer" would best capture the idea of another-task and
> maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?

But we do not wish to defer or delay anything.
How about: sysfs_action_from_neutral_context

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 10:27             ` Cornelia Huck
@ 2007-03-15 12:31               ` Hugh Dickins
  2007-03-15 13:02                 ` Oliver Neukum
  2007-03-15 14:27               ` Alan Stern
  1 sibling, 1 reply; 30+ messages in thread
From: Hugh Dickins @ 2007-03-15 12:31 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alan Stern, Linus Torvalds, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Cornelia Huck wrote:
> On Wed, 14 Mar 2007 15:23:10 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
> > 
> > sysfs_work_struct is too generic; other parts of sysfs might also want to
> > use workqueues for different purposes.
> 
> > I don't like calling it "delayed"-anything, because the operations aren't
> > necessarily delayed!  On an SMP system they might even execute before the
> > sysfs_access_in_other_task() call returns.  (Although the two examples we
> > have so far can't do that because of lock contention.)
> 
> Sure. But then you shouldn't refer to "delay" in the comments for the
> functions as well :)
> 
> > The major feature added here is that the work takes place in a different 
> > task's context, not that it is delayed.  Hence the choice of names.
> 
> Hm. Perhaps device_schedule_access()?

It's really none of my business, I'm merely the reporter the
deadlock being fixed, and I don't know my way around sysfs at all ...

... but I have to say I share your discomfort with Alan's
"sysfs_access_in_other_task" naming, it sounded very weird to me.

Quite apart from this mysterious "other task", I don't understand
"access" either.

Perhaps "defer" would best capture the idea of another-task and
maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?

Hugh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-14 19:23           ` Alan Stern
@ 2007-03-15 10:27             ` Cornelia Huck
  2007-03-15 12:31               ` Hugh Dickins
  2007-03-15 14:27               ` Alan Stern
  0 siblings, 2 replies; 30+ messages in thread
From: Cornelia Huck @ 2007-03-15 10:27 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Wed, 14 Mar 2007 15:23:10 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> > > +struct other_task_struct {
> > > +	struct kobject 		*kobj;
> > > +	void			(*func)(void *);
> > > +	void			*data;
> > > +	struct work_struct	work;
> > > +};
> > > +
> > > +static void other_task_work(struct work_struct *work)
> > > +{
> > > +	struct other_task_struct *ots = container_of(work,
> > > +			struct other_task_struct, work);
> > > +
> > > +	(ots->func)(ots->data);
> > > +	kobject_put(ots->kobj);
> > > +	kfree(ots);
> > > +}
> > 
> > The naming seems a bit unintuitive, but I don't have a good
> > alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?
> 
> sysfs_work_struct is too generic; other parts of sysfs might also want to
> use workqueues for different purposes.

> I don't like calling it "delayed"-anything, because the operations aren't
> necessarily delayed!  On an SMP system they might even execute before the
> sysfs_access_in_other_task() call returns.  (Although the two examples we
> have so far can't do that because of lock contention.)

Sure. But then you shouldn't refer to "delay" in the comments for the
functions as well :)

> The major feature added here is that the work takes place in a different 
> task's context, not that it is delayed.  Hence the choice of names.

Hm. Perhaps device_schedule_access()?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-14 18:43         ` Cornelia Huck
@ 2007-03-14 19:23           ` Alan Stern
  2007-03-15 10:27             ` Cornelia Huck
  0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-03-14 19:23 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Wed, 14 Mar 2007, Cornelia Huck wrote:

> On Wed, 14 Mar 2007 12:12:37 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> > This seems more elegant (not yet tested).  Cornelia, does it look okay to 
> > you?
> 
> Works for me (grouping & ungrouping ctc) and looks sane. Some more
> comments below.

Thank you.

> > +struct other_task_struct {
> > +	struct kobject 		*kobj;
> > +	void			(*func)(void *);
> > +	void			*data;
> > +	struct work_struct	work;
> > +};
> > +
> > +static void other_task_work(struct work_struct *work)
> > +{
> > +	struct other_task_struct *ots = container_of(work,
> > +			struct other_task_struct, work);
> > +
> > +	(ots->func)(ots->data);
> > +	kobject_put(ots->kobj);
> > +	kfree(ots);
> > +}
> 
> The naming seems a bit unintuitive, but I don't have a good
> alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?

sysfs_work_struct is too generic; other parts of sysfs might also want to
use workqueues for different purposes.

I don't like calling it "delayed"-anything, because the operations aren't
necessarily delayed!  On an SMP system they might even execute before the
sysfs_access_in_other_task() call returns.  (Although the two examples we
have so far can't do that because of lock contention.)

The major feature added here is that the work takes place in a different 
task's context, not that it is delayed.  Hence the choice of names.

Alan Stern


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-14 16:12       ` Alan Stern
@ 2007-03-14 18:43         ` Cornelia Huck
  2007-03-14 19:23           ` Alan Stern
  0 siblings, 1 reply; 30+ messages in thread
From: Cornelia Huck @ 2007-03-14 18:43 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Wed, 14 Mar 2007 12:12:37 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> This seems more elegant (not yet tested).  Cornelia, does it look okay to 
> you?

Works for me (grouping & ungrouping ctc) and looks sane. Some more
comments below.


> +struct other_task_struct {
> +	struct kobject 		*kobj;
> +	void			(*func)(void *);
> +	void			*data;
> +	struct work_struct	work;
> +};
> +
> +static void other_task_work(struct work_struct *work)
> +{
> +	struct other_task_struct *ots = container_of(work,
> +			struct other_task_struct, work);
> +
> +	(ots->func)(ots->data);
> +	kobject_put(ots->kobj);
> +	kfree(ots);
> +}

The naming seems a bit unintuitive, but I don't have a good
alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?

> +
> +/**
> + * sysfs_access_in_other_task - delay access from an attribute method.
> + * @kobj: object we're acting for.
> + * @func: callback function to invoke later.
> + * @data: argument to pass to @func.
> + *
> + * sysfs attribute methods must not unregister themselves or their parent
> + * kobject (which would amount to the same thing).  Attempts to do so will
> + * deadlock, since unregistration is mutually exclusive with driver
> + * callbacks.
> + *
> + * Instead methods can call this routine, which will attempt to allocate
> + * and schedule a workqueue request to carry out the requested function
> + * in the workqueue's process context.
> + *
> + * Returns 0 if the request was submitted, -ENOMEM if storage could not
> + * be allocated.
> + */
> +int sysfs_access_in_other_task(struct kobject *kobj, void (*func)(void *),
> +		void *data)

sysfs_delay_access()?


> +/**
> + * device_access_in_other_task - delay access from an attribute method.
> + * @dev: device.
> + * @func: callback function to invoke later.
> + *
> + * Attribute methods must not unregister themselves or their parent device
> + * (which would amount to the same thing).  Attempts to do so will deadlock,
> + * since unregistration is mutually exclusive with driver callbacks.
> + *
> + * Instead methods can call this routine, which will attempt to allocate
> + * and schedule a workqueue request to carry out the requested function
> + * in the workqueue's process context.
> + *
> + * Returns 0 if the request was submitted, -ENOMEM if storage could not
> + * be allocated.
> + */
> +int device_access_in_other_task(struct device *dev,
> +		void (*func)(struct device *))
> +{
> +	return sysfs_access_in_other_task(&dev->kobj,
> +			(void (*)(void *)) func, dev);
> +}
> +EXPORT_SYMBOL_GPL(device_access_in_other_task);

device_delay_access()?


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 21:20     ` Linus Torvalds
@ 2007-03-14 16:12       ` Alan Stern
  2007-03-14 18:43         ` Cornelia Huck
  0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-03-14 16:12 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Tue, 13 Mar 2007, Linus Torvalds wrote:

> Could we please make this easier to use by having some common sysfs helper 
> routine for this kind of "delayed_store()" functionality.
> 
> I'm not a huge fan of delayed work at all, but if we have to have it, at 
> least make it one generic function rather than having multiple functions 
> all doing their own workqueue logic for it.

This seems more elegant (not yet tested).  Cornelia, does it look okay to 
you?

Alan Stern


Index: usb-2.6/include/linux/sysfs.h
===================================================================
--- usb-2.6.orig/include/linux/sysfs.h
+++ usb-2.6/include/linux/sysfs.h
@@ -78,6 +78,9 @@ struct sysfs_ops {
 
 #ifdef CONFIG_SYSFS
 
+extern int sysfs_access_in_other_task(struct kobject *kobj,
+		void (*func)(void *), void *data);
+
 extern int __must_check
 sysfs_create_dir(struct kobject *, struct dentry *);
 
@@ -133,6 +136,12 @@ extern int __must_check sysfs_init(void)
 
 #else /* CONFIG_SYSFS */
 
+static inline int sysfs_access_in_other_task(struct kobject *kobj,
+		void (*func)(void *), void *data)
+{
+	return -ENOSYS;
+}
+
 static inline int sysfs_create_dir(struct kobject * k, struct dentry *shadow)
 {
 	return 0;
Index: usb-2.6/fs/sysfs/file.c
===================================================================
--- usb-2.6.orig/fs/sysfs/file.c
+++ usb-2.6/fs/sysfs/file.c
@@ -643,6 +643,59 @@ void sysfs_remove_file_from_group(struct
 }
 EXPORT_SYMBOL_GPL(sysfs_remove_file_from_group);
 
+struct other_task_struct {
+	struct kobject 		*kobj;
+	void			(*func)(void *);
+	void			*data;
+	struct work_struct	work;
+};
+
+static void other_task_work(struct work_struct *work)
+{
+	struct other_task_struct *ots = container_of(work,
+			struct other_task_struct, work);
+
+	(ots->func)(ots->data);
+	kobject_put(ots->kobj);
+	kfree(ots);
+}
+
+/**
+ * sysfs_access_in_other_task - delay access from an attribute method.
+ * @kobj: object we're acting for.
+ * @func: callback function to invoke later.
+ * @data: argument to pass to @func.
+ *
+ * sysfs attribute methods must not unregister themselves or their parent
+ * kobject (which would amount to the same thing).  Attempts to do so will
+ * deadlock, since unregistration is mutually exclusive with driver
+ * callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to carry out the requested function
+ * in the workqueue's process context.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ */
+int sysfs_access_in_other_task(struct kobject *kobj, void (*func)(void *),
+		void *data)
+{
+	struct other_task_struct *ots;
+
+	ots = kmalloc(sizeof(*ots), GFP_KERNEL);
+	if (!ots)
+		return -ENOMEM;
+	kobject_get(kobj);
+	ots->kobj = kobj;
+	ots->func = func;
+	ots->data = data;
+	INIT_WORK(&ots->work, other_task_work);
+	schedule_work(&ots->work);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sysfs_access_in_other_task);
+
 
 EXPORT_SYMBOL_GPL(sysfs_create_file);
 EXPORT_SYMBOL_GPL(sysfs_remove_file);
Index: usb-2.6/include/linux/device.h
===================================================================
--- usb-2.6.orig/include/linux/device.h
+++ usb-2.6/include/linux/device.h
@@ -356,6 +356,8 @@ extern int __must_check device_create_bi
 					       struct bin_attribute *attr);
 extern void device_remove_bin_file(struct device *dev,
 				   struct bin_attribute *attr);
+extern int device_access_in_other_task(struct device *dev,
+		void (*func)(struct device *));
 
 /* device resource management */
 typedef void (*dr_release_t)(struct device *dev, void *res);
Index: usb-2.6/drivers/base/core.c
===================================================================
--- usb-2.6.orig/drivers/base/core.c
+++ usb-2.6/drivers/base/core.c
@@ -407,6 +407,30 @@ void device_remove_bin_file(struct devic
 }
 EXPORT_SYMBOL_GPL(device_remove_bin_file);
 
+/**
+ * device_access_in_other_task - delay access from an attribute method.
+ * @dev: device.
+ * @func: callback function to invoke later.
+ *
+ * Attribute methods must not unregister themselves or their parent device
+ * (which would amount to the same thing).  Attempts to do so will deadlock,
+ * since unregistration is mutually exclusive with driver callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to carry out the requested function
+ * in the workqueue's process context.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ */
+int device_access_in_other_task(struct device *dev,
+		void (*func)(struct device *))
+{
+	return sysfs_access_in_other_task(&dev->kobj,
+			(void (*)(void *)) func, dev);
+}
+EXPORT_SYMBOL_GPL(device_access_in_other_task);
+
 static void klist_children_get(struct klist_node *n)
 {
 	struct device *dev = container_of(n, struct device, knode_parent);
Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -452,10 +452,22 @@ store_rescan_field (struct device *dev, 
 }
 static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
+static void sdev_store_delete_callback(struct device *dev)
+{
+	scsi_remove_device(to_scsi_device(dev));
+}
+
 static ssize_t sdev_store_delete(struct device *dev, struct device_attribute *attr, const char *buf,
 				 size_t count)
 {
-	scsi_remove_device(to_scsi_device(dev));
+	int rc;
+
+	/* An attribute cannot be unregistered by one of its own methods,
+	 * so we have to use device_access_in_other_task().
+	 */
+	rc = device_access_in_other_task(dev, sdev_store_delete_callback);
+	if (rc)
+		count = rc;
 	return count;
 };
 static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);
Index: usb-2.6/drivers/s390/cio/ccwgroup.c
===================================================================
--- usb-2.6.orig/drivers/s390/cio/ccwgroup.c
+++ usb-2.6/drivers/s390/cio/ccwgroup.c
@@ -71,19 +71,31 @@ __ccwgroup_remove_symlinks(struct ccwgro
  * Provide an 'ungroup' attribute so the user can remove group devices no
  * longer needed or accidentially created. Saves memory :)
  */
+static void ccwgroup_ungroup_callback(struct device *dev)
+{
+	struct ccwgroup_device *gdev = to_ccwgroupdev(dev);
+
+	__ccwgroup_remove_symlinks(gdev);
+	device_unregister(dev);
+}
+
 static ssize_t
 ccwgroup_ungroup_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
 {
 	struct ccwgroup_device *gdev;
+	int rc;
 
 	gdev = to_ccwgroupdev(dev);
 
 	if (gdev->state != CCWGROUP_OFFLINE)
 		return -EINVAL;
 
-	__ccwgroup_remove_symlinks(gdev);
-	device_unregister(dev);
-
+	/* Note that we cannot unregister the device from one of its
+	 * attribute methods, so we have to delay it.
+	 */
+	rc = device_access_in_other_task(dev, ccwgroup_ungroup_callback);
+	if (rc)
+		count = rc;
 	return count;
 }
 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 20:55       ` Hugh Dickins
  2007-03-13 21:08         ` Dmitry Torokhov
@ 2007-03-13 21:20         ` Alan Stern
  1 sibling, 0 replies; 30+ messages in thread
From: Alan Stern @ 2007-03-13 21:20 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Hugh Dickins wrote:

> On Tue, 13 Mar 2007, Alan Stern wrote:
> > 
> > On the other hand, a quick survey of the kernel source shows that
> > DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> > for the faint-of-heart!
> 
> Indeed, and faint-hearted Hugh wasn't intending to do so: but
> stout-hearted Alan will need to, won't he, before his patch can go in?

Allow me to point out that the original patch is Oliver's (although I
helped), and it doesn't need to go in -- it needs not to be removed.

Furthermore, I have better things to do with the next month of my time 
than auditing hundreds of routines I don't understand for behavior I 
probably won't be able to recognize.  (Although at 50 a day... hmmm, 
maybe.)

This sounds more like a job for kernel-janitors!


On Tue, 13 Mar 2007, Dmitry Torokhov wrote:

> I think we could rely on subsystems maintainers to let us know if
> there are potential problems. For example I can tell that neither
> input, serio nor gameport subsystems use sysfs to destroy their  
> devices (action on sysfs may cause some other device to be destroyed
> but that should be ok, only self-destruction is not allowed, right?)

Very good points.  USB doesn't do anything like that either.  And right, 
it's okay for a method to destroy other devices; it just can't do anything 
that would lead to its own unregistration.

Alan Stern


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 18:42   ` Cornelia Huck
@ 2007-03-13 21:20     ` Linus Torvalds
  2007-03-14 16:12       ` Alan Stern
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2007-03-13 21:20 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alan Stern, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list



On Tue, 13 Mar 2007, Cornelia Huck wrote:
> 
> Another call that deadlocked with Oliver's patch is ungroup for s390
> ccwgroup devices. It can be made to work again with a similar patch.

Could we please make this easier to use by having some common sysfs helper 
routine for this kind of "delayed_store()" functionality.

I'm not a huge fan of delayed work at all, but if we have to have it, at 
least make it one generic function rather than having multiple functions 
all doing their own workqueue logic for it.

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 20:55       ` Hugh Dickins
@ 2007-03-13 21:08         ` Dmitry Torokhov
  2007-03-13 21:20         ` Alan Stern
  1 sibling, 0 replies; 30+ messages in thread
From: Dmitry Torokhov @ 2007-03-13 21:08 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Stern, Oliver Neukum, Maneesh Soni, gregkh, Richard Purdie,
	James Bottomley, Linus Torvalds, Kernel development list

On 3/13/07, Hugh Dickins <hugh@veritas.com> wrote:
> On Tue, 13 Mar 2007, Alan Stern wrote:
> >
> > On the other hand, a quick survey of the kernel source shows that
> > DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> > for the faint-of-heart!
>
> Indeed, and faint-hearted Hugh wasn't intending to do so: but
> stout-hearted Alan will need to, won't he, before his patch can go in?
>

I think we could rely on subsystems maintainers to let us know if
there are potential problems. For example I can tell that neither
input, serio nor gameport subsystems use sysfs to destroy their
devices (action on sysfs may cause some other device to be destroyed
but that should be ok, only self-destruction is not allowed, right?)

-- 
Dmitry

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 20:09     ` Alan Stern
@ 2007-03-13 20:55       ` Hugh Dickins
  2007-03-13 21:08         ` Dmitry Torokhov
  2007-03-13 21:20         ` Alan Stern
  0 siblings, 2 replies; 30+ messages in thread
From: Hugh Dickins @ 2007-03-13 20:55 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Alan Stern wrote:
> 
> On the other hand, a quick survey of the kernel source shows that
> DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> for the faint-of-heart!

Indeed, and faint-hearted Hugh wasn't intending to do so: but
stout-hearted Alan will need to, won't he, before his patch can go in?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 19:00   ` Hugh Dickins
@ 2007-03-13 20:09     ` Alan Stern
  2007-03-13 20:55       ` Hugh Dickins
  0 siblings, 1 reply; 30+ messages in thread
From: Alan Stern @ 2007-03-13 20:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Hugh Dickins wrote:

> On Tue, 13 Mar 2007, Alan Stern wrote:
> > 
> > The consensus is that we would be better off keeping Oliver's original 
> > patch without your silly change, and instead fixing the particular method 
> > call that deadlocked.  Can you please try out the patch below with 
> > everything else as it was before?  It should solve your problem.
> 
> Yep, it works fine with your patch in and my silly reverted, thanks.
> But (I was about to say, even before seeing Cornelia's reply, honest!)
> I think you do need to check (audit the source? or is some runtime
> check possible?) for other such "suicidal" sysfs files, which
> seemed to (sysfs-ignorant) me to pose the real problem.

A runtime check wouldn't detect anything until someone tried to use the 
file -- at which point the process would deadlock anyway.

On the other hand, a quick survey of the kernel source shows that
DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
for the faint-of-heart!

Alan Stern


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 15:00 ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
  2007-03-13 18:42   ` Cornelia Huck
@ 2007-03-13 19:00   ` Hugh Dickins
  2007-03-13 20:09     ` Alan Stern
  1 sibling, 1 reply; 30+ messages in thread
From: Hugh Dickins @ 2007-03-13 19:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Alan Stern wrote:
> 
> The consensus is that we would be better off keeping Oliver's original 
> patch without your silly change, and instead fixing the particular method 
> call that deadlocked.  Can you please try out the patch below with 
> everything else as it was before?  It should solve your problem.

Yep, it works fine with your patch in and my silly reverted, thanks.
But (I was about to say, even before seeing Cornelia's reply, honest!)
I think you do need to check (audit the source? or is some runtime
check possible?) for other such "suicidal" sysfs files, which
seemed to (sysfs-ignorant) me to pose the real problem.

Hugh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 15:00 ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
@ 2007-03-13 18:42   ` Cornelia Huck
  2007-03-13 21:20     ` Linus Torvalds
  2007-03-13 19:00   ` Hugh Dickins
  1 sibling, 1 reply; 30+ messages in thread
From: Cornelia Huck @ 2007-03-13 18:42 UTC (permalink / raw)
  To: Alan Stern
  Cc: Hugh Dickins, Dmitry Torokhov, Oliver Neukum, Maneesh Soni,
	gregkh, Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007 11:00:21 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> The consensus is that we would be better off keeping Oliver's original 
> patch without your silly change, and instead fixing the particular method 
> call that deadlocked.

Another call that deadlocked with Oliver's patch is ungroup for s390
ccwgroup devices. It can be made to work again with a similar patch.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

---
 drivers/s390/cio/ccwgroup.c |   35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

--- linux-2.6.orig/drivers/s390/cio/ccwgroup.c
+++ linux-2.6/drivers/s390/cio/ccwgroup.c
@@ -67,22 +67,49 @@ __ccwgroup_remove_symlinks(struct ccwgro
 	
 }
 
+struct ccwgroup_work_struct {
+	struct ccwgroup_device *gdev;
+	struct work_struct work;
+};
+
+static void ccwgroup_ungroup_work(struct work_struct *work)
+{
+	struct ccwgroup_work_struct *ungroup_work
+		= container_of(work, struct ccwgroup_work_struct, work);
+
+	__ccwgroup_remove_symlinks(ungroup_work->gdev);
+	device_unregister(&ungroup_work->gdev->dev);
+	put_device(&ungroup_work->gdev->dev);
+	kfree(ungroup_work);
+}
+
 /*
  * Provide an 'ungroup' attribute so the user can remove group devices no
  * longer needed or accidentially created. Saves memory :)
+ * Note that we cannot unregister the device from one of its attribute
+ * methods, so we have to delay it.
  */
-static ssize_t
-ccwgroup_ungroup_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
+static ssize_t ccwgroup_ungroup_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
 {
 	struct ccwgroup_device *gdev;
+	struct ccwgroup_work_struct *ungroup_work;
 
 	gdev = to_ccwgroupdev(dev);
 
 	if (gdev->state != CCWGROUP_OFFLINE)
 		return -EINVAL;
 
-	__ccwgroup_remove_symlinks(gdev);
-	device_unregister(dev);
+	ungroup_work = kmalloc(sizeof(*ungroup_work), GFP_KERNEL);
+	if (!ungroup_work)
+		return -ENOMEM;
+	ungroup_work->gdev = gdev;
+	INIT_WORK(&ungroup_work->work, ccwgroup_ungroup_work);
+	if (!get_device(&gdev->dev))
+		kfree(ungroup_work);
+	else
+		schedule_work(&ungroup_work->work);
 
 	return count;
 }

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-12 21:31 refcounting drivers' data structures used in sysfs buffers Richard Purdie
@ 2007-03-13 15:00 ` Alan Stern
  2007-03-13 18:42   ` Cornelia Huck
  2007-03-13 19:00   ` Hugh Dickins
  0 siblings, 2 replies; 30+ messages in thread
From: Alan Stern @ 2007-03-13 15:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue. 6 Mar 2007, Hugh Dickins wrote:

> But suspend to RAM still hanging, unless I "chmod a-x /usr/sbin/docker"
> on SuSE 10.2: docker undock tries to unregister /sys/block/sr0 and hangs:
> 
> 60x60         D B0415080     0 10778  10771                     (NOTLB)
>        e8227e04 00000086 e80c60b0 b0415080 ef3f5454 b041dc20 ef3f5430 00000001 
>        e80c60b0 72af360e 00000085 00001941 e80c61bc e8227e00 b01606bf ef47d3c0 
>        ed07c1dc ed07c1e4 00000246 e8227e30 b02f6ef0 e80c60b0 00000001 e80c60b0 
> Call Trace:
>  [<b02f6ef0>] __down+0xaa/0xb8
>  [<b02f6de6>] __down_failed+0xa/0x10
>  [<b0180529>] sysfs_drop_dentry+0xa2/0xda
>  [<b01819b3>] __sysfs_remove_dir+0x6d/0xf8
>  [<b0181a53>] sysfs_remove_dir+0x15/0x20
>  [<b01d49a9>] kobject_del+0x16/0x22
>  [<b0230041>] device_del+0x1c9/0x1e2
>  [<b025705a>] __scsi_remove_device+0x43/0x7a
>  [<b02570b0>] scsi_remove_device+0x1f/0x2b
>  [<b0256a44>] sdev_store_delete+0x16/0x1b
>  [<b022f0a0>] dev_attr_store+0x32/0x34
>  [<b0180931>] flush_write_buffer+0x37/0x3d
>  [<b0180995>] sysfs_write_file+0x5e/0x82
>  [<b01507f5>] vfs_write+0xa7/0x150
>  [<b0150950>] sys_write+0x47/0x6b
>  [<b0103d56>] sysenter_past_esp+0x5f/0x85
>               /usr/lib/dockutils/hooks/thinkpad/60x60 undock
>               /usr/lib/dockutils/dockhandler undock
>               /usr/sbin/docker undock
>               /etc/pm/hooks/23dock suspend
> 
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.
> 
> Suspend no longer deadlocks with the following silly patch, but I expect
> this either pokes a small hole in your scheme, or renders it pointless.
> Maybe that commit needs to be reverted, or maybe you can see how to fix
> it up for -rc3.
> 
> Thanks,
> Hugh
> 
> --- 2.6.21-rc2-git5/fs/sysfs/inode.c	2007-02-28 08:30:26.000000000 
> +0000
> +++ linux/fs/sysfs/inode.c	2007-03-06 18:03:13.000000000 +0000
> @@ -227,11 +227,8 @@ static inline void orphan_all_buffers(st
>  
>  	mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
>  	if (node->i_private) {
> -		list_for_each_entry(buf, &set->associates, associates) {
> -			down(&buf->sem);
> +		list_for_each_entry(buf, &set->associates, associates)
>  			buf->orphaned = 1;
> -			up(&buf->sem);
> -		}
>  	}
>  	mutex_unlock(&node->i_mutex);
>  }

Hugh, there has been a long discussion among several people concerning 
this issue.  See for example this thread:

http://marc.info/?t=117335935200001&r=1&w=2

and also:

http://marc.info/?l=linux-kernel&m=117355959020831&w=2

The consensus is that we would be better off keeping Oliver's original 
patch without your silly change, and instead fixing the particular method 
call that deadlocked.  Can you please try out the patch below with 
everything else as it was before?  It should solve your problem.

Alan Stern


Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -452,10 +452,39 @@ store_rescan_field (struct device *dev, 
 }
 static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
+/* An attribute method cannot unregister itself, so this workaround for
+ * sdev_store_delete() is necessary.
+ */
+struct sdev_work_struct {
+	struct scsi_device *sdev;
+	struct work_struct work;
+};
+
+static void sdev_store_delete_work(struct work_struct *work)
+{
+	struct sdev_work_struct *sdw = container_of(work,
+			struct sdev_work_struct, work);
+
+	scsi_remove_device(sdw->sdev);
+	scsi_device_put(sdw->sdev);
+	kfree(sdw);
+}
+
 static ssize_t sdev_store_delete(struct device *dev, struct device_attribute *attr, const char *buf,
 				 size_t count)
 {
-	scsi_remove_device(to_scsi_device(dev));
+	struct scsi_device *sdev = to_scsi_device(dev);
+	struct sdev_work_struct *sdw;
+
+	sdw = kmalloc(sizeof(*sdw), GFP_KERNEL);
+	if (!sdw)
+		return -ENOMEM;
+	sdw->sdev = sdev;
+	INIT_WORK(&sdw->work, sdev_store_delete_work);
+	if (scsi_device_get(sdev) != 0)
+		kfree(sdw);
+	else
+		schedule_work(&sdw->work);
 	return count;
 };
 static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07 18:02         ` Linus Torvalds
@ 2007-03-07 18:16           ` Oliver Neukum
  0 siblings, 0 replies; 30+ messages in thread
From: Oliver Neukum @ 2007-03-07 18:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dmitry Torokhov, Hugh Dickins, Maneesh Soni, Greg Kroah-Hartman,
	Adrian Bunk, linux-kernel

Am Mittwoch, 7. März 2007 19:02 schrieb Linus Torvalds:
> 
> On Wed, 7 Mar 2007, Oliver Neukum wrote:
> >
> > The problem also exists with unplugging devices. Drivers get no feedback
> > to tell them when it is safe to free the data structures associated with
> > an attribute.
> 
> So you just pointed to *another* data structure that apparently violates 
> the "you MUST use refcounting" rule.
> 
> What is it with you people? It's really simple. Data structures must be 
> refcounted if you can reach them two different ways.
> 
> If you don't use refcounting, then you'd better make sure that the data 
> can be reached only one way (for example, by *not* exposing it for sysfs).
> 
> It really *is* that simple. Read the CodingStyle rules.

Very well, there seems to be no clean way to avoid that work.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07 16:59       ` Oliver Neukum
@ 2007-03-07 18:02         ` Linus Torvalds
  2007-03-07 18:16           ` Oliver Neukum
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2007-03-07 18:02 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Dmitry Torokhov, Hugh Dickins, Oliver Neukum, Maneesh Soni,
	Greg Kroah-Hartman, Adrian Bunk, linux-kernel



On Wed, 7 Mar 2007, Oliver Neukum wrote:
>
> The problem also exists with unplugging devices. Drivers get no feedback
> to tell them when it is safe to free the data structures associated with
> an attribute.

So you just pointed to *another* data structure that apparently violates 
the "you MUST use refcounting" rule.

What is it with you people? It's really simple. Data structures must be 
refcounted if you can reach them two different ways.

If you don't use refcounting, then you'd better make sure that the data 
can be reached only one way (for example, by *not* exposing it for sysfs).

It really *is* that simple. Read the CodingStyle rules.

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07 16:52     ` Linus Torvalds
@ 2007-03-07 16:59       ` Oliver Neukum
  2007-03-07 18:02         ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Oliver Neukum @ 2007-03-07 16:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dmitry Torokhov, Hugh Dickins, Oliver Neukum, Maneesh Soni,
	Greg Kroah-Hartman, Adrian Bunk, linux-kernel

Am Mittwoch, 7. März 2007 17:52 schrieb Linus Torvalds:
> 
> On Wed, 7 Mar 2007, Dmitry Torokhov wrote:
> > 
> > ... with the exception that it will again make data associated with
> > sysfs attributes accessible past the point of returning from
> > sysfs_remove_file. And that was the point so drivers would not have to
> > care about handling access to extra data (such as static strings) past
> > the driver unload.
> 
> Drivers are unloaded by stopping the whole machine (exactly because module 
> unload is otherwise so hard to handle), so that never happens unless you 
> actively block. In other words, if you do something as simple as

The problem also exists with unplugging devices. Drivers get no feedback
to tell them when it is safe to free the data structures associated with
an attribute.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07 15:56   ` Dmitry Torokhov
@ 2007-03-07 16:52     ` Linus Torvalds
  2007-03-07 16:59       ` Oliver Neukum
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2007-03-07 16:52 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Hugh Dickins, Oliver Neukum, Maneesh Soni, Greg Kroah-Hartman,
	Adrian Bunk, linux-kernel



On Wed, 7 Mar 2007, Dmitry Torokhov wrote:
> 
> ... with the exception that it will again make data associated with
> sysfs attributes accessible past the point of returning from
> sysfs_remove_file. And that was the point so drivers would not have to
> care about handling access to extra data (such as static strings) past
> the driver unload.

Drivers are unloaded by stopping the whole machine (exactly because module 
unload is otherwise so hard to handle), so that never happens unless you 
actively block. In other words, if you do something as simple as

	if (inode->i_private_data)
		sysfs_flush_buffer(buffer);

then there is no race with unloading (unless the driver itself does 
something stupid, of course - but the whole point of having a kernel 
buffer is so that it does *not* have to make user accesses etc).

But the one thing you should *not* do is to depend on a sleeping lock, 
because that breaks the whole model!

			Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07  1:56 ` Linus Torvalds
  2007-03-07 14:38   ` Oliver Neukum
@ 2007-03-07 15:56   ` Dmitry Torokhov
  2007-03-07 16:52     ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Dmitry Torokhov @ 2007-03-07 15:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Oliver Neukum, Maneesh Soni, Greg Kroah-Hartman,
	Adrian Bunk, linux-kernel

On 3/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>  - removing the buffer is now just
>
>        mutex_lock(&inode->i_mutex);
>        buffer = inode->i_private;
>        inode->i_private = NULL;
>        mutex_unlock(&inode->i_mutex);
>
>        put_sysfs_buffer(buffer);
>
>  - everybody is happy!
>

... with the exception that it will again make data associated with
sysfs attributes accessible past the point of returning from
sysfs_remove_file. And that was the point so drivers would not have to
care about handling access to extra data (such as static strings) past
the driver unload.

I wonder if we should keep Oliver's change and require attribute
implementations to offload "delete me" kind of actions to workqueues.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07  1:56 ` Linus Torvalds
@ 2007-03-07 14:38   ` Oliver Neukum
  2007-03-07 15:56   ` Dmitry Torokhov
  1 sibling, 0 replies; 30+ messages in thread
From: Oliver Neukum @ 2007-03-07 14:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Maneesh Soni, Greg Kroah-Hartman, Adrian Bunk,
	linux-kernel

Am Mittwoch, 7. März 2007 02:56 schrieb Linus Torvalds:
> Anyway, I'm unable to revert the broken commit, since there are now other 
> changes that depend on it, but can somebody *please* do that? I'll apply 
> Hugh's silly patch in the meantime, just to avoid the lockup.

As you like it. This patch reverts it.

	Regards
		Oliver

Signed-off-by: Oliver Neukum <oliver@neukum.name>
-----

--- orig/fs/sysfs/inode.c	2007-03-07 10:49:42.000000000 +0100
+++ linux-2.6.21-rc3/fs/sysfs/inode.c	2007-03-07 10:52:56.000000000 +0100
@@ -220,20 +220,6 @@
 	return NULL;
 }
 
-static inline void orphan_all_buffers(struct inode *node)
-{
-	struct sysfs_buffer_collection *set = node->i_private;
-	struct sysfs_buffer *buf;
-
-	mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
-	if (node->i_private) {
-		list_for_each_entry(buf, &set->associates, associates)
-			buf->orphaned = 1;
-	}
-	mutex_unlock(&node->i_mutex);
-}
-
-
 /*
  * Unhashes the dentry corresponding to given sysfs_dirent
  * Called with parent inode's i_mutex held.
@@ -241,23 +227,16 @@
 void sysfs_drop_dentry(struct sysfs_dirent * sd, struct dentry * parent)
 {
 	struct dentry * dentry = sd->s_dentry;
-	struct inode *inode;
 
 	if (dentry) {
 		spin_lock(&dcache_lock);
 		spin_lock(&dentry->d_lock);
 		if (!(d_unhashed(dentry) && dentry->d_inode)) {
-			inode = dentry->d_inode;
-			spin_lock(&inode->i_lock);
-			__iget(inode);
-			spin_unlock(&inode->i_lock);
 			dget_locked(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
 			simple_unlink(parent->d_inode, dentry);
-			orphan_all_buffers(inode);
-			iput(inode);
 		} else {
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
--- orig/fs/sysfs/file.c	2007-03-07 10:37:28.000000000 +0100
+++ linux-2.6.21-rc3/fs/sysfs/file.c	2007-03-07 10:54:00.000000000 +0100
@@ -7,7 +7,6 @@
 #include <linux/kobject.h>
 #include <linux/namei.h>
 #include <linux/poll.h>
-#include <linux/list.h>
 #include <asm/uaccess.h>
 #include <asm/semaphore.h>
 
@@ -52,30 +51,6 @@
 };
 
 /**
- *	add_to_collection - add buffer to a collection
- *	@buffer:	buffer to be added
- *	@node:		inode of set to add to
- */
-
-static inline void
-add_to_collection(struct sysfs_buffer *buffer, struct inode *node)
-{
-	struct sysfs_buffer_collection *set = node->i_private;
-
-	mutex_lock(&node->i_mutex);
-	list_add(&buffer->associates, &set->associates);
-	mutex_unlock(&node->i_mutex);
-}
-
-static inline void
-remove_from_collection(struct sysfs_buffer *buffer, struct inode *node)
-{
-	mutex_lock(&node->i_mutex);
-	list_del(&buffer->associates);
-	mutex_unlock(&node->i_mutex);
-}
-
-/**
  *	fill_read_buffer - allocate and fill buffer from object.
  *	@dentry:	dentry pointer.
  *	@buffer:	data buffer for file.
@@ -168,10 +143,6 @@
 	ssize_t retval = 0;
 
 	down(&buffer->sem);
-	if (buffer->orphaned) {
-		retval = -ENODEV;
-		goto out;
-	}
 	if (buffer->needs_read_fill) {
 		if ((retval = fill_read_buffer(file->f_path.dentry,buffer)))
 			goto out;
@@ -261,16 +232,11 @@
 	ssize_t len;
 
 	down(&buffer->sem);
-	if (buffer->orphaned) {
-		len = -ENODEV;
-		goto out;
-	}
 	len = fill_write_buffer(buffer, buf, count);
 	if (len > 0)
 		len = flush_write_buffer(file->f_path.dentry, buffer, len);
 	if (len > 0)
 		*ppos += len;
-out:
 	up(&buffer->sem);
 	return len;
 }
@@ -279,7 +245,6 @@
 {
 	struct kobject *kobj = sysfs_get_kobject(file->f_path.dentry->d_parent);
 	struct attribute * attr = to_attr(file->f_path.dentry);
-	struct sysfs_buffer_collection *set;
 	struct sysfs_buffer * buffer;
 	struct sysfs_ops * ops = NULL;
 	int error = 0;
@@ -309,18 +274,6 @@
 	if (!ops)
 		goto Eaccess;
 
-	/* make sure we have a collection to add our buffers to */
-	mutex_lock(&inode->i_mutex);
-	if (!(set = inode->i_private)) {
-		if (!(set = inode->i_private = kmalloc(sizeof(struct sysfs_buffer_collection), GFP_KERNEL))) {
-			error = -ENOMEM;
-			goto Done;
-		} else {
-			INIT_LIST_HEAD(&set->associates);
-		}
-	}
-	mutex_unlock(&inode->i_mutex);
-
 	/* File needs write support.
 	 * The inode's perms must say it's ok, 
 	 * and we must have a store method.
@@ -346,11 +299,9 @@
 	 */
 	buffer = kzalloc(sizeof(struct sysfs_buffer), GFP_KERNEL);
 	if (buffer) {
-		INIT_LIST_HEAD(&buffer->associates);
 		init_MUTEX(&buffer->sem);
 		buffer->needs_read_fill = 1;
 		buffer->ops = ops;
-		add_to_collection(buffer, inode);
 		file->private_data = buffer;
 	} else
 		error = -ENOMEM;
@@ -375,8 +326,6 @@
 	struct module * owner = attr->owner;
 	struct sysfs_buffer * buffer = filp->private_data;
 
-	if (buffer)
-		remove_from_collection(buffer, inode);
 	kobject_put(kobj);
 	/* After this point, attr should not be accessed. */
 	module_put(owner);
--- orig/fs/sysfs/mount.c	2007-03-07 10:37:50.000000000 +0100
+++ linux-2.6.21-rc3/fs/sysfs/mount.c	2007-03-07 10:38:14.000000000 +0100
@@ -19,12 +19,9 @@
 struct super_block * sysfs_sb = NULL;
 struct kmem_cache *sysfs_dir_cachep;
 
-static void sysfs_clear_inode(struct inode *inode);
-
 static const struct super_operations sysfs_ops = {
 	.statfs		= simple_statfs,
 	.drop_inode	= sysfs_delete_inode,
-	.clear_inode	= sysfs_clear_inode,
 };
 
 static struct sysfs_dirent sysfs_root = {
@@ -35,11 +32,6 @@
 	.s_iattr	= NULL,
 };
 
-static void sysfs_clear_inode(struct inode *inode)
-{
-	kfree(inode->i_private);
-}
-
 static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct inode *inode;
--- orig/fs/sysfs/sysfs.h	2007-03-07 10:38:44.000000000 +0100
+++ linux-2.6.21-rc3/fs/sysfs/sysfs.h	2007-03-07 10:39:26.000000000 +0100
@@ -46,21 +46,15 @@
 };
 
 struct sysfs_buffer {
-	struct list_head		associates;
 	size_t				count;
 	loff_t				pos;
 	char				* page;
 	struct sysfs_ops		* ops;
 	struct semaphore		sem;
-	int				orphaned;
 	int				needs_read_fill;
 	int				event;
 };
 
-struct sysfs_buffer_collection {
-	struct list_head	associates;
-};
-
 static inline struct kobject * to_kobj(struct dentry * dentry)
 {
 	struct sysfs_dirent * sd = dentry->d_fsdata;

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-06 19:20 Hugh Dickins
  2007-03-06 20:16 ` Oliver Neukum
@ 2007-03-07  1:56 ` Linus Torvalds
  2007-03-07 14:38   ` Oliver Neukum
  2007-03-07 15:56   ` Dmitry Torokhov
  1 sibling, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2007-03-07  1:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oliver Neukum, Maneesh Soni, Greg Kroah-Hartman, Adrian Bunk,
	linux-kernel



On Tue, 6 Mar 2007, Hugh Dickins wrote:
> 
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.

Gaah. What a crock.

I really don't see any alternative to just reverting the whole change. 
Hugh's patch is simple, but rather pointless.

The fact is, the whole change is *bogus*.

We don't "lock" datastructures. We *reference count* them!

This is so fundamental that it's even mentioned in the file 
Documentation/CodingStyle in "Chapter 11: Data structures".

The whole "orphaned" kind of locking is broken. It's stupid. The way we do 
races between removal and use is that initial setup sets a reference count 
of 1, and something really simple like:

	static inline struct sysfs_buffer *get_sysfs_buffer(struct inode *inode)
	{
		struct sysfs_buffer *buffer = inode->i_private;

		BUG_ON(!mutex_locked(&inode->i_mutex));
		if (buffer)
			atomic_inc(&buffer->count);
		return buffer;
	}

	static inline void put_sysfs_buffer(struct sysfs_buffer *buffer)
	{
		if (atomic_dec_and_test(&buffer->count))
			kfree(buffer);
	}

and then the rule is:

 - everybody uses "get_sysfs_buffer()" to follow the reference (and yes, 
   you obviously have to hold "inode->i_mutex" for this to be safe! I 
   added the BUG_ON() as an example)

 - everybody uses "put_buffer()" to release it (and we simply don't *care* 
   whether somebody else released it too, since everybody has a reference 
   count)

 - removing the buffer is now just

	mutex_lock(&inode->i_mutex);
	buffer = inode->i_private;
	inode->i_private = NULL;
	mutex_unlock(&inode->i_mutex);

	put_sysfs_buffer(buffer);

 - everybody is happy!

Anyway, I'm unable to revert the broken commit, since there are now other 
changes that depend on it, but can somebody *please* do that? I'll apply 
Hugh's silly patch in the meantime, just to avoid the lockup.

			Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-06 19:20 Hugh Dickins
@ 2007-03-06 20:16 ` Oliver Neukum
  2007-03-07  1:56 ` Linus Torvalds
  1 sibling, 0 replies; 30+ messages in thread
From: Oliver Neukum @ 2007-03-06 20:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oliver Neukum, Maneesh Soni, Greg Kroah-Hartman, Adrian Bunk,
	Linus Torvalds, linux-kernel

Am Dienstag, 6. März 2007 20:20 schrieb Hugh Dickins:
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.

I had not thought about sysfs removing files in sysfs.

> Suspend no longer deadlocks with the following silly patch, but I expect
> this either pokes a small hole in your scheme, or renders it pointless.

The latter.
 
> Maybe that commit needs to be reverted, or maybe you can see how to fix
> it up for -rc3.

If you want a quick fix a work queue could be used, but it's a kludge.
Suggestions, anybody?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 30+ messages in thread

* 2.6.21-rc suspend regression: sysfs deadlock
@ 2007-03-06 19:20 Hugh Dickins
  2007-03-06 20:16 ` Oliver Neukum
  2007-03-07  1:56 ` Linus Torvalds
  0 siblings, 2 replies; 30+ messages in thread
From: Hugh Dickins @ 2007-03-06 19:20 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Maneesh Soni, Greg Kroah-Hartman, Adrian Bunk, Linus Torvalds,
	linux-kernel

Resume from RAM on a ThinkPad T43p is now happy with Thomas' periodic
tick fix - the most unusable aspect of that for me had been how slow
repeat keys were to start repeating, but that's all fine now.

But suspend to RAM still hanging, unless I "chmod a-x /usr/sbin/docker"
on SuSE 10.2: docker undock tries to unregister /sys/block/sr0 and hangs:

60x60         D B0415080     0 10778  10771                     (NOTLB)
       e8227e04 00000086 e80c60b0 b0415080 ef3f5454 b041dc20 ef3f5430 00000001 
       e80c60b0 72af360e 00000085 00001941 e80c61bc e8227e00 b01606bf ef47d3c0 
       ed07c1dc ed07c1e4 00000246 e8227e30 b02f6ef0 e80c60b0 00000001 e80c60b0 
Call Trace:
 [<b02f6ef0>] __down+0xaa/0xb8
 [<b02f6de6>] __down_failed+0xa/0x10
 [<b0180529>] sysfs_drop_dentry+0xa2/0xda
 [<b01819b3>] __sysfs_remove_dir+0x6d/0xf8
 [<b0181a53>] sysfs_remove_dir+0x15/0x20
 [<b01d49a9>] kobject_del+0x16/0x22
 [<b0230041>] device_del+0x1c9/0x1e2
 [<b025705a>] __scsi_remove_device+0x43/0x7a
 [<b02570b0>] scsi_remove_device+0x1f/0x2b
 [<b0256a44>] sdev_store_delete+0x16/0x1b
 [<b022f0a0>] dev_attr_store+0x32/0x34
 [<b0180931>] flush_write_buffer+0x37/0x3d
 [<b0180995>] sysfs_write_file+0x5e/0x82
 [<b01507f5>] vfs_write+0xa7/0x150
 [<b0150950>] sys_write+0x47/0x6b
 [<b0103d56>] sysenter_past_esp+0x5f/0x85
              /usr/lib/dockutils/hooks/thinkpad/60x60 undock
              /usr/lib/dockutils/dockhandler undock
              /usr/sbin/docker undock
              /etc/pm/hooks/23dock suspend

This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
while calling flush_write_buffer, and flushing that particular write
buffer entails downing buffer->sem in orphan_all_buffers.

Suspend no longer deadlocks with the following silly patch, but I expect
this either pokes a small hole in your scheme, or renders it pointless.
Maybe that commit needs to be reverted, or maybe you can see how to fix
it up for -rc3.

Thanks,
Hugh

--- 2.6.21-rc2-git5/fs/sysfs/inode.c	2007-02-28 08:30:26.000000000 +0000
+++ linux/fs/sysfs/inode.c	2007-03-06 18:03:13.000000000 +0000
@@ -227,11 +227,8 @@ static inline void orphan_all_buffers(st
 
 	mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
 	if (node->i_private) {
-		list_for_each_entry(buf, &set->associates, associates) {
-			down(&buf->sem);
+		list_for_each_entry(buf, &set->associates, associates)
 			buf->orphaned = 1;
-			up(&buf->sem);
-		}
 	}
 	mutex_unlock(&node->i_mutex);
 }

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-03-15 16:52 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-10 20:44 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
  -- strict thread matches above, loose matches on Subject: below --
2007-03-12 21:31 refcounting drivers' data structures used in sysfs buffers Richard Purdie
2007-03-13 15:00 ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
2007-03-13 18:42   ` Cornelia Huck
2007-03-13 21:20     ` Linus Torvalds
2007-03-14 16:12       ` Alan Stern
2007-03-14 18:43         ` Cornelia Huck
2007-03-14 19:23           ` Alan Stern
2007-03-15 10:27             ` Cornelia Huck
2007-03-15 12:31               ` Hugh Dickins
2007-03-15 13:02                 ` Oliver Neukum
2007-03-15 13:22                   ` Dmitry Torokhov
2007-03-15 13:59                     ` Hugh Dickins
2007-03-15 14:27               ` Alan Stern
2007-03-15 15:32                 ` Cornelia Huck
2007-03-15 16:29                 ` Hugh Dickins
2007-03-15 16:51                   ` Linus Torvalds
2007-03-13 19:00   ` Hugh Dickins
2007-03-13 20:09     ` Alan Stern
2007-03-13 20:55       ` Hugh Dickins
2007-03-13 21:08         ` Dmitry Torokhov
2007-03-13 21:20         ` Alan Stern
2007-03-06 19:20 Hugh Dickins
2007-03-06 20:16 ` Oliver Neukum
2007-03-07  1:56 ` Linus Torvalds
2007-03-07 14:38   ` Oliver Neukum
2007-03-07 15:56   ` Dmitry Torokhov
2007-03-07 16:52     ` Linus Torvalds
2007-03-07 16:59       ` Oliver Neukum
2007-03-07 18:02         ` Linus Torvalds
2007-03-07 18:16           ` Oliver Neukum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).