linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* refcounting drivers' data structures used in sysfs buffers
@ 2007-03-08 13:05 Oliver Neukum
  2007-03-08 16:02 ` Alan Stern
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-08 13:05 UTC (permalink / raw)
  To: Dmitry Torokhov, Maneesh Soni, gregkh, Alan Stern, linux-kernel

Hi,

after a lightning bolt from high above I've been looking into refcounting
the data structures drivers use to provide the data used to refill sysfs
buffers. I've come to the following conclusion.

1. struct sysfs_buffer must have a struct kref * and probably a destructor
pointer
2. drivers must be able to pass these pointers through an extended
device_create_file()
3. Drivers must use refcounting if they want to use attributes
4. read/write/poll must do refcounting

I am not sure where to store the pointers. struct sysfs_dirent() looks
like the obvious choice. Comments?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-08 13:05 refcounting drivers' data structures used in sysfs buffers Oliver Neukum
@ 2007-03-08 16:02 ` Alan Stern
  2007-03-09  0:45   ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-08 16:02 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Thu, 8 Mar 2007, Oliver Neukum wrote:

> Hi,
> 
> after a lightning bolt from high above I've been looking into refcounting
> the data structures drivers use to provide the data used to refill sysfs
> buffers. I've come to the following conclusion.
> 
> 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> pointer
> 2. drivers must be able to pass these pointers through an extended
> device_create_file()
> 3. Drivers must use refcounting if they want to use attributes
> 4. read/write/poll must do refcounting
> 
> I am not sure where to store the pointers. struct sysfs_dirent() looks
> like the obvious choice. Comments?

Can you explain the reasoning that led to these conclusions?  And what 
exactly was your lightning bolt?

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-08 16:02 ` Alan Stern
@ 2007-03-09  0:45   ` Oliver Neukum
  2007-03-09 16:32     ` Alan Stern
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-09  0:45 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> On Thu, 8 Mar 2007, Oliver Neukum wrote:
> 
> > Hi,
> > 
> > after a lightning bolt from high above I've been looking into refcounting
> > the data structures drivers use to provide the data used to refill sysfs
> > buffers. I've come to the following conclusion.
> > 
> > 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> > pointer
> > 2. drivers must be able to pass these pointers through an extended
> > device_create_file()
> > 3. Drivers must use refcounting if they want to use attributes
> > 4. read/write/poll must do refcounting
> > 
> > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > like the obvious choice. Comments?
> 
> Can you explain the reasoning that led to these conclusions?  And what 
> exactly was your lightning bolt?

The old race between disconnect and IO to attribute via sysfs again.
If I cannot disassociate the drivers from the buffers in the buffers, drivers
must not deallocate the data necessary to answer sysfs callbacks while
a buffer exists. Reading from a buffer must up a refcount in the driver's
data structures. The question becomes how to get a pointer to the buffer.
And it cannot live in the dentry as the dentry can go away while files
are still open. This leaves the inode or the buffer.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09  0:45   ` Oliver Neukum
@ 2007-03-09 16:32     ` Alan Stern
  2007-03-09 16:44       ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-09 16:32 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Fri, 9 Mar 2007, Oliver Neukum wrote:

> Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> > On Thu, 8 Mar 2007, Oliver Neukum wrote:
> > 
> > > Hi,
> > > 
> > > after a lightning bolt from high above I've been looking into refcounting
> > > the data structures drivers use to provide the data used to refill sysfs
> > > buffers. I've come to the following conclusion.
> > > 
> > > 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> > > pointer
> > > 2. drivers must be able to pass these pointers through an extended
> > > device_create_file()
> > > 3. Drivers must use refcounting if they want to use attributes
> > > 4. read/write/poll must do refcounting
> > > 
> > > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > > like the obvious choice. Comments?
> > 
> > Can you explain the reasoning that led to these conclusions?  And what 
> > exactly was your lightning bolt?
> 
> The old race between disconnect and IO to attribute via sysfs again.
> If I cannot disassociate the drivers from the buffers in the buffers, drivers
> must not deallocate the data necessary to answer sysfs callbacks while
> a buffer exists.

Why wouldn't you be able to dissociate a driver from a buffer?  That was 
the whole point of adding .orphan to sysfs_buffer and creating 
sysfs_buffer_collection -- it was supposed to solve exactly this race.

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 16:32     ` Alan Stern
@ 2007-03-09 16:44       ` Oliver Neukum
  2007-03-09 17:02         ` Dmitry Torokhov
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-09 16:44 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Freitag, 9. März 2007 17:32 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Oliver Neukum wrote:
> 
> > Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> > > On Thu, 8 Mar 2007, Oliver Neukum wrote:
> > > 
> > > > Hi,
> > > > 
> > > > after a lightning bolt from high above I've been looking into refcounting
> > > > the data structures drivers use to provide the data used to refill sysfs
> > > > buffers. I've come to the following conclusion.
> > > > 
> > > > 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> > > > pointer
> > > > 2. drivers must be able to pass these pointers through an extended
> > > > device_create_file()
> > > > 3. Drivers must use refcounting if they want to use attributes
> > > > 4. read/write/poll must do refcounting
> > > > 
> > > > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > > > like the obvious choice. Comments?
> > > 
> > > Can you explain the reasoning that led to these conclusions?  And what 
> > > exactly was your lightning bolt?
> > 
> > The old race between disconnect and IO to attribute via sysfs again.
> > If I cannot disassociate the drivers from the buffers in the buffers, drivers
> > must not deallocate the data necessary to answer sysfs callbacks while
> > a buffer exists.
> 
> Why wouldn't you be able to dissociate a driver from a buffer?  That was 
> the whole point of adding .orphan to sysfs_buffer and creating 
> sysfs_buffer_collection -- it was supposed to solve exactly this race.

It did solve the race but deadlocked when unbinding devices through sysfs.
Linux therefore asked for the patch to be reverted and wants the isue solved
with refcounting.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 16:44       ` Oliver Neukum
@ 2007-03-09 17:02         ` Dmitry Torokhov
  2007-03-09 17:18           ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Torokhov @ 2007-03-09 17:02 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Alan Stern, Maneesh Soni, gregkh, linux-kernel

On 3/9/07, Oliver Neukum <oneukum@suse.de> wrote:
> Am Freitag, 9. März 2007 17:32 schrieb Alan Stern:
> > On Fri, 9 Mar 2007, Oliver Neukum wrote:
> >
> > > Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> > > > On Thu, 8 Mar 2007, Oliver Neukum wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > after a lightning bolt from high above I've been looking into refcounting
> > > > > the data structures drivers use to provide the data used to refill sysfs
> > > > > buffers. I've come to the following conclusion.
> > > > >
> > > > > 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> > > > > pointer
> > > > > 2. drivers must be able to pass these pointers through an extended
> > > > > device_create_file()
> > > > > 3. Drivers must use refcounting if they want to use attributes
> > > > > 4. read/write/poll must do refcounting
> > > > >
> > > > > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > > > > like the obvious choice. Comments?
> > > >
> > > > Can you explain the reasoning that led to these conclusions?  And what
> > > > exactly was your lightning bolt?
> > >
> > > The old race between disconnect and IO to attribute via sysfs again.
> > > If I cannot disassociate the drivers from the buffers in the buffers, drivers
> > > must not deallocate the data necessary to answer sysfs callbacks while
> > > a buffer exists.
> >
> > Why wouldn't you be able to dissociate a driver from a buffer?  That was
> > the whole point of adding .orphan to sysfs_buffer and creating
> > sysfs_buffer_collection -- it was supposed to solve exactly this race.
>
> It did solve the race but deadlocked when unbinding devices through sysfs.
> Linux therefore asked for the patch to be reverted and wants the isue solved
> with refcounting.
>

I think we already have all refcounting that is needed. What is
missing is subsystem-provided ->release() hooks for drivers to release
driver-specific resources when a device finally goes away.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 17:02         ` Dmitry Torokhov
@ 2007-03-09 17:18           ` Oliver Neukum
  2007-03-09 17:34             ` Dmitry Torokhov
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-09 17:18 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Alan Stern, Maneesh Soni, gregkh, linux-kernel

Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:

> I think we already have all refcounting that is needed. What is
> missing is subsystem-provided ->release() hooks for drivers to release
> driver-specific resources when a device finally goes away.

This is an interesting idea. Is it nice to pass through release()
but not open() ?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 17:18           ` Oliver Neukum
@ 2007-03-09 17:34             ` Dmitry Torokhov
  2007-03-09 19:32               ` Alan Stern
  2007-03-09 20:08               ` Alan Stern
  0 siblings, 2 replies; 61+ messages in thread
From: Dmitry Torokhov @ 2007-03-09 17:34 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Alan Stern, Maneesh Soni, gregkh, linux-kernel

On 3/9/07, Oliver Neukum <oneukum@suse.de> wrote:
> Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:
>
> > I think we already have all refcounting that is needed. What is
> > missing is subsystem-provided ->release() hooks for drivers to release
> > driver-specific resources when a device finally goes away.
>
> This is an interesting idea. Is it nice to pass through release()
> but not open() ?
>

Not sure if I follow... Generally speaking open is not a mandatory
operation; however every object in driver model has a release method.
What I am saying is that certain drivers need to have their disconnect
method split in 2 parts - one that shuts down the device and second is
releases resources that might be accesses through sysfs (and other
kernel parts). That second part will have to be called from
subsystem's core ->release() method se we need a release() hook.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 17:34             ` Dmitry Torokhov
@ 2007-03-09 19:32               ` Alan Stern
  2007-03-09 20:05                 ` Oliver Neukum
  2007-03-09 20:08               ` Alan Stern
  1 sibling, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-09 19:32 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Oliver Neukum, Maneesh Soni, gregkh, linux-kernel

On Fri, 9 Mar 2007, Dmitry Torokhov wrote:

> On 3/9/07, Oliver Neukum <oneukum@suse.de> wrote:
> > Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:
> >
> > > I think we already have all refcounting that is needed. What is
> > > missing is subsystem-provided ->release() hooks for drivers to release
> > > driver-specific resources when a device finally goes away.
> >
> > This is an interesting idea. Is it nice to pass through release()
> > but not open() ?
> >
> 
> Not sure if I follow... Generally speaking open is not a mandatory
> operation; however every object in driver model has a release method.
> What I am saying is that certain drivers need to have their disconnect
> method split in 2 parts - one that shuts down the device and second is
> releases resources that might be accesses through sysfs (and other
> kernel parts). That second part will have to be called from
> subsystem's core ->release() method se we need a release() hook.

Dmitry, you're not viewing this correctly.

Adding a new release() callback would solve the problem by creating 
another.  Drivers need to release their data as soon as possible after
they unbind from a device, not when the device itself goes away.  Think
about what would happen if you tried to rmmod a driver.  The rmmod process 
would block until the device was unregistered.

Oliver, your idea won't work either.  Think about what would happen if 
someone did

	rmmod driver_module </sys/devices/.../attribute_file

The rmmod process would never actually read the attribute, so until it 
exited the private data structure would have a positive refcount.  But 
rmmod can't exit until the driver has been unloaded from memory, and it 
can't be unloaded while its data structure is still allocated.  Thus we 
would end up with deadlock; rmmod would hang forever.

It might be better to keep your earlier patch and fix the deadlock you
mentioned earlier, the one that occurs when unbinding a driver through
sysfs.  How exactly does that deadlock work?

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 19:32               ` Alan Stern
@ 2007-03-09 20:05                 ` Oliver Neukum
  2007-03-09 20:27                   ` Alan Stern
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-09 20:05 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Freitag, 9. März 2007 20:32 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Dmitry Torokhov wrote:
> 
> > On 3/9/07, Oliver Neukum <oneukum@suse.de> wrote:
> > > Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:
> > >
> > > > I think we already have all refcounting that is needed. What is
> > > > missing is subsystem-provided ->release() hooks for drivers to release
> > > > driver-specific resources when a device finally goes away.
> > >
> > > This is an interesting idea. Is it nice to pass through release()
> > > but not open() ?
> > >
> > 
> > Not sure if I follow... Generally speaking open is not a mandatory
> > operation; however every object in driver model has a release method.
> > What I am saying is that certain drivers need to have their disconnect
> > method split in 2 parts - one that shuts down the device and second is
> > releases resources that might be accesses through sysfs (and other
> > kernel parts). That second part will have to be called from
> > subsystem's core ->release() method se we need a release() hook.
> 
> Dmitry, you're not viewing this correctly.
> 
> Adding a new release() callback would solve the problem by creating 
> another.  Drivers need to release their data as soon as possible after
> they unbind from a device, not when the device itself goes away.  Think

Wait, the callback from closing the file in sysfs is the earliest we can safely
free the data structure. How do you want to free earlier?

> about what would happen if you tried to rmmod a driver.  The rmmod process 
> would block until the device was unregistered.
> 
> Oliver, your idea won't work either.  Think about what would happen if 
> someone did
> 
> 	rmmod driver_module </sys/devices/.../attribute_file

-EBUSY, as currently.
 
> The rmmod process would never actually read the attribute, so until it 
> exited the private data structure would have a positive refcount.  But 
> rmmod can't exit until the driver has been unloaded from memory, and it 
> can't be unloaded while its data structure is still allocated.  Thus we 
> would end up with deadlock; rmmod would hang forever.
> 
> It might be better to keep your earlier patch and fix the deadlock you
> mentioned earlier, the one that occurs when unbinding a driver through
> sysfs.  How exactly does that deadlock work?

http://lkml.org/lkml/2007/3/6/364
http://lkml.org/lkml/2007/3/6/528

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 17:34             ` Dmitry Torokhov
  2007-03-09 19:32               ` Alan Stern
@ 2007-03-09 20:08               ` Alan Stern
  2007-03-09 20:48                 ` Oliver Neukum
  1 sibling, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-09 20:08 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Oliver Neukum, Maneesh Soni, gregkh, linux-kernel

On Fri, 9 Mar 2007, Alan Stern wrote:

> Oliver, your idea won't work either.  Think about what would happen if 
> someone did
> 
> 	rmmod driver_module </sys/devices/.../attribute_file
> 
> The rmmod process would never actually read the attribute, so until it 
> exited the private data structure would have a positive refcount.  But 
> rmmod can't exit until the driver has been unloaded from memory, and it 
> can't be unloaded while its data structure is still allocated.  Thus we 
> would end up with deadlock; rmmod would hang forever.

I take this back.  Redirecting stdin to the attribute file would increase 
the module's refcount and cause rmmod to exit immediately with an error.

After some more thought, I basically agree with what Oliver wrote
originally.  sysfs_dirent is indeed the logical place to store the kref
pointer.  However it needs to be used during open and release, not during
read, write, and poll.  Another point, which Oliver didn't think of, is
that the kref pointer needs to be passed to the driver as an argument in
the show() and store() method calls.

Implementing this will be difficult.  One possibility is to change the 
definition of sysfs_ops, adding the new struct kref * argument to the 
prototypes.  This will involve changing _lots_ of source files, adding an 
unused argument to many functions, which isn't attractive.

The other possibility is to test at runtime whether the kref pointer is 
NULL, and if it is, don't pass it.  This would work, but it isn't 
type-safe.

Finally, there's added complexity in each driver which wants to use the 
new facility.  The module_exit routine will need to be smart enough to 
block until all the private data structures have been released.  
usb-storage does something like that now; it's kind of ugly (although it 
could be improved if appropriate support were added to the core kernel).

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 20:05                 ` Oliver Neukum
@ 2007-03-09 20:27                   ` Alan Stern
  2007-03-09 20:39                     ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-09 20:27 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Fri, 9 Mar 2007, Oliver Neukum wrote:

> > Adding a new release() callback would solve the problem by creating 
> > another.  Drivers need to release their data as soon as possible after
> > they unbind from a device, not when the device itself goes away.  Think
> 
> Wait, the callback from closing the file in sysfs is the earliest we can safely
> free the data structure. How do you want to free earlier?

It is _not_ the earliest we can safely free the data structure.

Dmitry's callback occurs when _all_ the sysfs attributes have been
released -- including ones that don't have anything to do with the
driver's private data structure.  Think of the bInterfaceClass attribute,
for example.

But even aside from that, Dmitry's suggestion is wrong.  He wants to add a
second release() method to the driver, which can be called from the
subsystem's release() method -- which doesn't run until the device is
unregistered, possibly long after the driver has been unbound from it.  
Then how could the subsystem even know which drivers need their second
release method called?


> > Oliver, your idea won't work either.  Think about what would happen if 
> > someone did
> > 
> > 	rmmod driver_module </sys/devices/.../attribute_file
> 
> -EBUSY, as currently.

Yeah, I was wrong about that.


> > It might be better to keep your earlier patch and fix the deadlock you
> > mentioned earlier, the one that occurs when unbinding a driver through
> > sysfs.  How exactly does that deadlock work?
> 
> http://lkml.org/lkml/2007/3/6/364
> http://lkml.org/lkml/2007/3/6/528

I get the picture, thanks.

Alan Stern



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 20:27                   ` Alan Stern
@ 2007-03-09 20:39                     ` Oliver Neukum
  0 siblings, 0 replies; 61+ messages in thread
From: Oliver Neukum @ 2007-03-09 20:39 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Freitag, 9. März 2007 21:27 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Oliver Neukum wrote:
> 
> > > Adding a new release() callback would solve the problem by creating 
> > > another.  Drivers need to release their data as soon as possible after
> > > they unbind from a device, not when the device itself goes away.  Think
> > 
> > Wait, the callback from closing the file in sysfs is the earliest we can safely
> > free the data structure. How do you want to free earlier?
> 
> It is _not_ the earliest we can safely free the data structure.
> 
> Dmitry's callback occurs when _all_ the sysfs attributes have been
> released -- including ones that don't have anything to do with the
> driver's private data structure.  Think of the bInterfaceClass attribute,
> for example.

Ok, yes I see. It is by far too late.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 20:08               ` Alan Stern
@ 2007-03-09 20:48                 ` Oliver Neukum
  2007-03-10 19:19                   ` Alan Stern
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-09 20:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Freitag, 9. März 2007 21:08 schrieb Alan Stern:
> After some more thought, I basically agree with what Oliver wrote
> originally.  sysfs_dirent is indeed the logical place to store the kref
> pointer.  However it needs to be used during open and release, not during

OK.

> read, write, and poll.  Another point, which Oliver didn't think of, is
> that the kref pointer needs to be passed to the driver as an argument in
> the show() and store() method calls.

Why? What's wrong with simply calling kref_get/put?
 
> Finally, there's added complexity in each driver which wants to use the 
> new facility.  The module_exit routine will need to be smart enough to 
> block until all the private data structures have been released.  
> usb-storage does something like that now; it's kind of ugly (although it 
> could be improved if appropriate support were added to the core kernel).

If we up the module count for every bound device, all device attributes
should be gone before we ever get that far.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 20:48                 ` Oliver Neukum
@ 2007-03-10 19:19                   ` Alan Stern
  2007-03-12  8:54                     ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-10 19:19 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Fri, 9 Mar 2007, Oliver Neukum wrote:

> Am Freitag, 9. März 2007 21:08 schrieb Alan Stern:
> > After some more thought, I basically agree with what Oliver wrote
> > originally.  sysfs_dirent is indeed the logical place to store the kref
> > pointer.  However it needs to be used during open and release, not during
> 
> OK.
> 
> > read, write, and poll.  Another point, which Oliver didn't think of, is
> > that the kref pointer needs to be passed to the driver as an argument in
> > the show() and store() method calls.
> 
> Why? What's wrong with simply calling kref_get/put?

It's the same old problem: the race between unbind and sysfs I/O.  What
good does holding a reference to the private data structure do if the
show/store method gets called after the driver has been unbound from the
device?  dev_get_drvdata() will no longer provide a valid pointer to the
private data, so the method will have no way to access it.  Hence the
method needs another argument.

(BTW, the sysfs core would actually need more than a kref.  It would also 
need a pointer to a release routine -- the kref contains only the atomic 
counter.  The more you think about it, the more complicated this approach 
becomes.)

> > Finally, there's added complexity in each driver which wants to use the 
> > new facility.  The module_exit routine will need to be smart enough to 
> > block until all the private data structures have been released.  
> > usb-storage does something like that now; it's kind of ugly (although it 
> > could be improved if appropriate support were added to the core kernel).
> 
> If we up the module count for every bound device, all device attributes
> should be gone before we ever get that far.

Not quite right.  However, since every open sysfs file holds a module
reference, if the driver's module_exit has been called then there can be
no open sysfs files, hence no private data still pinned.  Thus this isn't
a problem at all.


But never mind all the above.  I'm going to post another message on this
thread in which I argue that Oliver's original approach was a good one and
should not have been reverted.  The specific problem identified by Hugh
Dickins can be fixed in the way Dmitry first suggested, by doing the real
operation from a workqueue routine.

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-10 19:19                   ` Alan Stern
@ 2007-03-12  8:54                     ` Oliver Neukum
  2007-03-12 14:57                       ` Alan Stern
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-12  8:54 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Samstag, 10. März 2007 20:19 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Oliver Neukum wrote:
> 
> > Am Freitag, 9. März 2007 21:08 schrieb Alan Stern:
> > > After some more thought, I basically agree with what Oliver wrote
> > > originally.  sysfs_dirent is indeed the logical place to store the kref
> > > pointer.  However it needs to be used during open and release, not during
> > 
> > OK.
> > 
> > > read, write, and poll.  Another point, which Oliver didn't think of, is
> > > that the kref pointer needs to be passed to the driver as an argument in
> > > the show() and store() method calls.
> > 
> > Why? What's wrong with simply calling kref_get/put?
> 
> It's the same old problem: the race between unbind and sysfs I/O.  What
> good does holding a reference to the private data structure do if the
> show/store method gets called after the driver has been unbound from the
> device?  dev_get_drvdata() will no longer provide a valid pointer to the
> private data, so the method will have no way to access it.  Hence the
> method needs another argument.

It does half the job. You can make sure the driver is not asked to access
freed memory.
It is true that a driver will have to mark that device "disconnected"
and return errors if that device's attributes are referenced, but this can
be done internally.

Yes, this is a bit more complicated.
{rant mode}
Who came up with the idea of making life simpler by adding a code path?
All these problems were already solved for device nodes. Ioctl is ugly, but
at least a known code path.
{rant off}

> (BTW, the sysfs core would actually need more than a kref.  It would also 
> need a pointer to a release routine -- the kref contains only the atomic

Yes, this is implied.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12  8:54                     ` Oliver Neukum
@ 2007-03-12 14:57                       ` Alan Stern
  2007-03-12 15:23                         ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-12 14:57 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Mon, 12 Mar 2007, Oliver Neukum wrote:

> > > Why? What's wrong with simply calling kref_get/put?
> > 
> > It's the same old problem: the race between unbind and sysfs I/O.  What
> > good does holding a reference to the private data structure do if the
> > show/store method gets called after the driver has been unbound from the
> > device?  dev_get_drvdata() will no longer provide a valid pointer to the
> > private data, so the method will have no way to access it.  Hence the
> > method needs another argument.
> 
> It does half the job. You can make sure the driver is not asked to access
> freed memory.
> It is true that a driver will have to mark that device "disconnected"
> and return errors if that device's attributes are referenced, but this can
> be done internally.

No, you're missing the point.  Let's say driver A's disconnect() is
called, so the driver marks its private data structure as "disconnected"
and does dev_set_drvdata(NULL).  Then driver B is probed and bound to the
device, and it does its own dev_set_drvdata().  Then a user still holding
an open sysfs file reference for driver A calls a show() or store()  
method.  The method will do dev_get_drvdata(), receiving the pointer to
driver B's private data.  Now you're in trouble, because A's method will
think it owns B's private data!

> Yes, this is a bit more complicated.
> {rant mode}
> Who came up with the idea of making life simpler by adding a code path?
> All these problems were already solved for device nodes. Ioctl is ugly, but
> at least a known code path.
> {rant off}

I'll let Greg give the complete answer.  :-)  Bear in mind, however, that
the aim was probably to make life simpler for userspace -- which does not
mean making life simpler for the kernel.

(Incidentally, I'm not so sure that all these problems really were solved 
by ioctl on device nodes.  I bet you could find plenty of cases where 
ioctl races with disconnect if you looked.)

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 14:57                       ` Alan Stern
@ 2007-03-12 15:23                         ` Oliver Neukum
  2007-03-12 15:42                           ` Dmitry Torokhov
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-12 15:23 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Montag, 12. März 2007 15:57 schrieb Alan Stern:probably nece
> On Mon, 12 Mar 2007, Oliver Neukum wrote:
> 
> > > > Why? What's wrong with simply calling kref_get/put?
> > > 
> > > It's the same old problem: the race between unbind and sysfs I/O.  What
> > > good does holding a reference to the private data structure do if the
> > > show/store method gets called after the driver has been unbound from the
> > > device?  dev_get_drvdata() will no longer provide a valid pointer to the
> > > private data, so the method will have no way to access it.  Hence the
> > > method needs another argument.
> > 
> > It does half the job. You can make sure the driver is not asked to access
> > freed memory.
> > It is true that a driver will have to mark that device "disconnected"
> > and return errors if that device's attributes are referenced, but this can
> > be done internally.
> 
> No, you're missing the point.  Let's say driver A's disconnect() is
> called, so the driver marks its private data structure as "disconnected"
> and does dev_set_drvdata(NULL).  Then driver B is probed and bound to the
> device, and it does its own dev_set_drvdata().  Then a user still holding
> an open sysfs file reference for driver A calls a show() or store()  
> method.  The method will do dev_get_drvdata(), receiving the pointer to
> driver B's private data.  Now you're in trouble, because A's method will
> think it owns B's private data!

Yes, I was missing the point. In consequence, drivers must not use
dev_get_drvdata() to get their references to their private data. It's
probably necessary to store it in struct sysfs_buffer and include that
in the store/show callbacks.
(The same does apply to interfaces of course)

> > Yes, this is a bit more complicated.
> > {rant mode}
> > Who came up with the idea of making life simpler by adding a code path?
> > All these problems were already solved for device nodes. Ioctl is ugly, but
> > at least a known code path.
> > {rant off}
> 
> I'll let Greg give the complete answer.  :-)  Bear in mind, however, that
> the aim was probably to make life simpler for userspace -- which does not
> mean making life simpler for the kernel.

That doesn't mean that the method needed to be thrown out.
Sysfs could simply pass through the syscalls for a device, like
it is done in character devices. I am tempted to recommend
such radical surgery.

> (Incidentally, I'm not so sure that all these problems really were solved 
> by ioctl on device nodes.  I bet you could find plenty of cases where 
> ioctl races with disconnect if you looked.)

I will look. Death to all race conditions.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 15:23                         ` Oliver Neukum
@ 2007-03-12 15:42                           ` Dmitry Torokhov
  2007-03-12 15:59                             ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Torokhov @ 2007-03-12 15:42 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Alan Stern, Maneesh Soni, gregkh, linux-kernel

On 3/12/07, Oliver Neukum <oneukum@suse.de> wrote:
> Am Montag, 12. März 2007 15:57 schrieb Alan Stern:probably nece
> > On Mon, 12 Mar 2007, Oliver Neukum wrote:
> >
> > > > > Why? What's wrong with simply calling kref_get/put?
> > > >
> > > > It's the same old problem: the race between unbind and sysfs I/O.  What
> > > > good does holding a reference to the private data structure do if the
> > > > show/store method gets called after the driver has been unbound from the
> > > > device?  dev_get_drvdata() will no longer provide a valid pointer to the
> > > > private data, so the method will have no way to access it.  Hence the
> > > > method needs another argument.
> > >
> > > It does half the job. You can make sure the driver is not asked to access
> > > freed memory.
> > > It is true that a driver will have to mark that device "disconnected"
> > > and return errors if that device's attributes are referenced, but this can
> > > be done internally.
> >
> > No, you're missing the point.  Let's say driver A's disconnect() is
> > called, so the driver marks its private data structure as "disconnected"
> > and does dev_set_drvdata(NULL).  Then driver B is probed and bound to the
> > device, and it does its own dev_set_drvdata().  Then a user still holding
> > an open sysfs file reference for driver A calls a show() or store()
> > method.  The method will do dev_get_drvdata(), receiving the pointer to
> > driver B's private data.  Now you're in trouble, because A's method will
> > think it owns B's private data!
>
> Yes, I was missing the point. In consequence, drivers must not use
> dev_get_drvdata() to get their references to their private data. It's
> probably necessary to store it in struct sysfs_buffer and include that
> in the store/show callbacks.
> (The same does apply to interfaces of course)
>

Or drivers coudl verify that they still bound to the device they are
about to operate on (psmouse does this by taking a lock on device and
then checking if driver bound is the same address as psmouse). But I'd
rather get rid of all this clutter if we could sever sysfs access
after removing corresponding attributes.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 15:42                           ` Dmitry Torokhov
@ 2007-03-12 15:59                             ` Oliver Neukum
  2007-03-12 16:21                               ` Alan Stern
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-12 15:59 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Alan Stern, Maneesh Soni, gregkh, linux-kernel

Am Montag, 12. März 2007 16:42 schrieb Dmitry Torokhov:
> On 3/12/07, Oliver Neukum <oneukum@suse.de> wrote:
> > Am Montag, 12. März 2007 15:57 schrieb Alan Stern
> > > No, you're missing the point.  Let's say driver A's disconnect() is
> > > called, so the driver marks its private data structure as "disconnected"
> > > and does dev_set_drvdata(NULL).  Then driver B is probed and bound to the
> > > device, and it does its own dev_set_drvdata().  Then a user still holding
> > > an open sysfs file reference for driver A calls a show() or store()
> > > method.  The method will do dev_get_drvdata(), receiving the pointer to
> > > driver B's private data.  Now you're in trouble, because A's method will
> > > think it owns B's private data!
> >
> > Yes, I was missing the point. In consequence, drivers must not use
> > dev_get_drvdata() to get their references to their private data. It's
> > probably necessary to store it in struct sysfs_buffer and include that
> > in the store/show callbacks.
> > (The same does apply to interfaces of course)
> >
> 
> Or drivers coudl verify that they still bound to the device they are
> about to operate on (psmouse does this by taking a lock on device and
> then checking if driver bound is the same address as psmouse). But I'd
> rather get rid of all this clutter if we could sever sysfs access
> after removing corresponding attributes.

No, the call has to fail if the driver is rebound to the device.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 15:59                             ` Oliver Neukum
@ 2007-03-12 16:21                               ` Alan Stern
  2007-03-12 18:25                                 ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-12 16:21 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Mon, 12 Mar 2007, Oliver Neukum wrote:

> > > Yes, I was missing the point. In consequence, drivers must not use
> > > dev_get_drvdata() to get their references to their private data. It's

You do realize how foolish that sounds?  Why do you think 
dev_get_drvdata() was written in the first place?

> > > probably necessary to store it in struct sysfs_buffer and include that
> > > in the store/show callbacks.
> > > (The same does apply to interfaces of course)
> > >
> > 
> > Or drivers coudl verify that they still bound to the device they are
> > about to operate on (psmouse does this by taking a lock on device and
> > then checking if driver bound is the same address as psmouse). But I'd
> > rather get rid of all this clutter if we could sever sysfs access
> > after removing corresponding attributes.
> 
> No, the call has to fail if the driver is rebound to the device.

I'm with Dmitry; the whole thing becomes much, much simpler if we put back
your patch and prevent sysfs access after unregistering an attribute 
file.  No API changes are needed, no driver changes are needed, no radical 
core changes are needed,...  All we would have to do is fix the one SCSI 
method to make it use a workqueue.

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 16:21                               ` Alan Stern
@ 2007-03-12 18:25                                 ` Oliver Neukum
  2007-03-12 19:31                                   ` Alan Stern
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-12 18:25 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Montag, 12. März 2007 17:21 schrieb Alan Stern:
> On Mon, 12 Mar 2007, Oliver Neukum wrote:
> 
> > > > Yes, I was missing the point. In consequence, drivers must not use
> > > > dev_get_drvdata() to get their references to their private data. It's
> 
> You do realize how foolish that sounds?  Why do you think 
> dev_get_drvdata() was written in the first place?

It's still useful in disconnect/suspend/resume/etc...
If everything were alright with the design, we wouldn't be discussing
it now, would we?

 
> I'm with Dmitry; the whole thing becomes much, much simpler if we put back
> your patch and prevent sysfs access after unregistering an attribute 
> file.  No API changes are needed, no driver changes are needed, no radical 
> core changes are needed,...  All we would have to do is fix the one SCSI 
> method to make it use a workqueue.

Try. I don't like reverting my own code. But I predict he'll tell you that a
driver's bond with a device should be represented in a data structure
that is to be refcounted.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 18:25                                 ` Oliver Neukum
@ 2007-03-12 19:31                                   ` Alan Stern
  2007-03-12 19:49                                     ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-12 19:31 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Mon, 12 Mar 2007, Oliver Neukum wrote:

> > I'm with Dmitry; the whole thing becomes much, much simpler if we put back
> > your patch and prevent sysfs access after unregistering an attribute 
> > file.  No API changes are needed, no driver changes are needed, no radical 
> > core changes are needed,...  All we would have to do is fix the one SCSI 
> > method to make it use a workqueue.
> 
> Try.

I did.  Didn't you see this message from Saturday:

http://marc.theaimsgroup.com/?l=linux-kernel&m=117355959020831&w=2

I sent it to Linus as well as to all of you.  No replies received so far 
from anybody.

>  I don't like reverting my own code. But I predict he'll tell you that a
> driver's bond with a device should be represented in a data structure
> that is to be refcounted.

:-)

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 19:31                                   ` Alan Stern
@ 2007-03-12 19:49                                     ` Oliver Neukum
  2007-03-12 20:03                                       ` Alan Stern
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-12 19:49 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Montag, 12. März 2007 20:31 schrieb Alan Stern:
> On Mon, 12 Mar 2007, Oliver Neukum wrote:
> 
> > > I'm with Dmitry; the whole thing becomes much, much simpler if we put back
> > > your patch and prevent sysfs access after unregistering an attribute 
> > > file.  No API changes are needed, no driver changes are needed, no radical 
> > > core changes are needed,...  All we would have to do is fix the one SCSI 
> > > method to make it use a workqueue.
> > 
> > Try.
> 
> I did.  Didn't you see this message from Saturday:
> 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=117355959020831&w=2

Yes. In this case, silence is partial agreement. However, convincing me
is futile if Linus rejects the approach.

I wrote the original patch. But this problem must be solved. If the
first attempt is rejected, I'll try another.

> I sent it to Linus as well as to all of you.  No replies received so far 
> from anybody.
> 
> >  I don't like reverting my own code. But I predict he'll tell you that a
> > driver's bond with a device should be represented in a data structure
> > that is to be refcounted.
> 
> :-)

Coming to think about it, he might be right there.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 19:49                                     ` Oliver Neukum
@ 2007-03-12 20:03                                       ` Alan Stern
  2007-03-12 20:15                                         ` Oliver Neukum
  2007-03-12 20:31                                         ` Dmitry Torokhov
  0 siblings, 2 replies; 61+ messages in thread
From: Alan Stern @ 2007-03-12 20:03 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Mon, 12 Mar 2007, Oliver Neukum wrote:

> > >  I don't like reverting my own code. But I predict he'll tell you that a
> > > driver's bond with a device should be represented in a data structure
> > > that is to be refcounted.
> > 
> > :-)
> 
> Coming to think about it, he might be right there.

There still would be a synchronization problem.  Refcounts don't solve
races; they only solve lifetime problems.  And you would still have to
change the sysfs API, plus all the other stuff...

Do you think Linus would listen if all three of us (plus maybe Greg) tried 
to convince him?

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 20:03                                       ` Alan Stern
@ 2007-03-12 20:15                                         ` Oliver Neukum
  2007-03-12 20:31                                         ` Dmitry Torokhov
  1 sibling, 0 replies; 61+ messages in thread
From: Oliver Neukum @ 2007-03-12 20:15 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Montag, 12. März 2007 21:03 schrieb Alan Stern:
> On Mon, 12 Mar 2007, Oliver Neukum wrote:
> 
> > > >  I don't like reverting my own code. But I predict he'll tell you that a
> > > > driver's bond with a device should be represented in a data structure
> > > > that is to be refcounted.
> > > 
> > > :-)
> > 
> > Coming to think about it, he might be right there.
> 
> There still would be a synchronization problem.  Refcounts don't solve
> races; they only solve lifetime problems.  And you would still have to
> change the sysfs API, plus all the other stuff...
> 
> Do you think Linus would listen if all three of us (plus maybe Greg) tried 
> to convince him?

No. He'd tell you that a crap API should be changed.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 20:03                                       ` Alan Stern
  2007-03-12 20:15                                         ` Oliver Neukum
@ 2007-03-12 20:31                                         ` Dmitry Torokhov
  2007-03-12 20:45                                           ` Alan Stern
  2007-03-12 21:31                                           ` Richard Purdie
  1 sibling, 2 replies; 61+ messages in thread
From: Dmitry Torokhov @ 2007-03-12 20:31 UTC (permalink / raw)
  To: Alan Stern; +Cc: Oliver Neukum, Maneesh Soni, gregkh, linux-kernel

On 3/12/07, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 12 Mar 2007, Oliver Neukum wrote:
>
> > > >  I don't like reverting my own code. But I predict he'll tell you that a
> > > > driver's bond with a device should be represented in a data structure
> > > > that is to be refcounted.
> > >
> > > :-)
> >
> > Coming to think about it, he might be right there.
>
> There still would be a synchronization problem.  Refcounts don't solve
> races; they only solve lifetime problems.  And you would still have to
> change the sysfs API, plus all the other stuff...
>
> Do you think Linus would listen if all three of us (plus maybe Greg) tried
> to convince him?
>

If we'd accompany the argument with the patch that changes scsi to use
wq to perform deletion so we don't have deadlock regression in the
kernel he might be more perceptive... He is right about lifetime
issues but this is not strictly lifetime issue as you correctly point
out. Plus, refcounting also bloats the kernel so I don't relly want to
use refcount for every integer I happen to export through sysfs if I
can simply "revoke" access.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 20:31                                         ` Dmitry Torokhov
@ 2007-03-12 20:45                                           ` Alan Stern
  2007-03-12 21:31                                           ` Richard Purdie
  1 sibling, 0 replies; 61+ messages in thread
From: Alan Stern @ 2007-03-12 20:45 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Oliver Neukum, Maneesh Soni, gregkh, linux-kernel

On Mon, 12 Mar 2007, Dmitry Torokhov wrote:

> > Do you think Linus would listen if all three of us (plus maybe Greg) tried
> > to convince him?
> >
> 
> If we'd accompany the argument with the patch that changes scsi to use
> wq to perform deletion so we don't have deadlock regression in the
> kernel he might be more perceptive...

I wrote that patch over the weekend but forgot to bring it in to work.  
I'll post it tonight or tomorrow.

>  He is right about lifetime
> issues but this is not strictly lifetime issue as you correctly point
> out. Plus, refcounting also bloats the kernel so I don't relly want to
> use refcount for every integer I happen to export through sysfs if I
> can simply "revoke" access.

Agreed.

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 20:31                                         ` Dmitry Torokhov
  2007-03-12 20:45                                           ` Alan Stern
@ 2007-03-12 21:31                                           ` Richard Purdie
  2007-03-13 15:00                                             ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
  1 sibling, 1 reply; 61+ messages in thread
From: Richard Purdie @ 2007-03-12 21:31 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Alan Stern, Oliver Neukum, Maneesh Soni, gregkh, linux-kernel

On Mon, 2007-03-12 at 16:31 -0400, Dmitry Torokhov wrote:
> On 3/12/07, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Mon, 12 Mar 2007, Oliver Neukum wrote:
> > > > >  I don't like reverting my own code. But I predict he'll tell you that a
> > > > > driver's bond with a device should be represented in a data structure
> > > > > that is to be refcounted.
> >
> > There still would be a synchronization problem.  Refcounts don't solve
> > races; they only solve lifetime problems.  And you would still have to
> > change the sysfs API, plus all the other stuff...
> >
> > Do you think Linus would listen if all three of us (plus maybe Greg) tried
> > to convince him?
> >
> 
> If we'd accompany the argument with the patch that changes scsi to use
> wq to perform deletion so we don't have deadlock regression in the
> kernel he might be more perceptive... He is right about lifetime
> issues but this is not strictly lifetime issue as you correctly point
> out. Plus, refcounting also bloats the kernel so I don't relly want to
> use refcount for every integer I happen to export through sysfs if I
> can simply "revoke" access.

For what its worth, I think it makes sense if the driver no longer has
to worry about sysfs attributes after they've been removed. This is
something the core should look after, not each and every driver.

http://marc.theaimsgroup.com/?l=linux-kernel&m=117355959020831&w=2

makes a lot of sense, particularly that "No driver callbacks occur after
unregistration". When writing the backlight class code, I remember
checking into this, concluding that seemed to be the design of sysfs and
thinking it a sane design.

The alternative is to force each and every driver to do its own
refcounting. My experience with locking in the extremely simple
backlight class shows nobody reads the documentation or writes the code
correctly. With that, I've given up and added suitable locking to the
core even if not every driver needs it. In doing so, I made a net
removal of a few hundred lines of broken "ticking timebomb" style code.
I dread to think what would happen if every driver had to deal with
sysfs refcounting.

So count me as a vote for handling this in the sysfs core, not the
drivers.

Richard



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-12 21:31                                           ` Richard Purdie
@ 2007-03-13 15:00                                             ` Alan Stern
  2007-03-13 18:42                                               ` Cornelia Huck
  2007-03-13 19:00                                               ` 2.6.21-rc suspend regression: sysfs deadlock Hugh Dickins
  0 siblings, 2 replies; 61+ messages in thread
From: Alan Stern @ 2007-03-13 15:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue. 6 Mar 2007, Hugh Dickins wrote:

> But suspend to RAM still hanging, unless I "chmod a-x /usr/sbin/docker"
> on SuSE 10.2: docker undock tries to unregister /sys/block/sr0 and hangs:
> 
> 60x60         D B0415080     0 10778  10771                     (NOTLB)
>        e8227e04 00000086 e80c60b0 b0415080 ef3f5454 b041dc20 ef3f5430 00000001 
>        e80c60b0 72af360e 00000085 00001941 e80c61bc e8227e00 b01606bf ef47d3c0 
>        ed07c1dc ed07c1e4 00000246 e8227e30 b02f6ef0 e80c60b0 00000001 e80c60b0 
> Call Trace:
>  [<b02f6ef0>] __down+0xaa/0xb8
>  [<b02f6de6>] __down_failed+0xa/0x10
>  [<b0180529>] sysfs_drop_dentry+0xa2/0xda
>  [<b01819b3>] __sysfs_remove_dir+0x6d/0xf8
>  [<b0181a53>] sysfs_remove_dir+0x15/0x20
>  [<b01d49a9>] kobject_del+0x16/0x22
>  [<b0230041>] device_del+0x1c9/0x1e2
>  [<b025705a>] __scsi_remove_device+0x43/0x7a
>  [<b02570b0>] scsi_remove_device+0x1f/0x2b
>  [<b0256a44>] sdev_store_delete+0x16/0x1b
>  [<b022f0a0>] dev_attr_store+0x32/0x34
>  [<b0180931>] flush_write_buffer+0x37/0x3d
>  [<b0180995>] sysfs_write_file+0x5e/0x82
>  [<b01507f5>] vfs_write+0xa7/0x150
>  [<b0150950>] sys_write+0x47/0x6b
>  [<b0103d56>] sysenter_past_esp+0x5f/0x85
>               /usr/lib/dockutils/hooks/thinkpad/60x60 undock
>               /usr/lib/dockutils/dockhandler undock
>               /usr/sbin/docker undock
>               /etc/pm/hooks/23dock suspend
> 
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.
> 
> Suspend no longer deadlocks with the following silly patch, but I expect
> this either pokes a small hole in your scheme, or renders it pointless.
> Maybe that commit needs to be reverted, or maybe you can see how to fix
> it up for -rc3.
> 
> Thanks,
> Hugh
> 
> --- 2.6.21-rc2-git5/fs/sysfs/inode.c	2007-02-28 08:30:26.000000000 
> +0000
> +++ linux/fs/sysfs/inode.c	2007-03-06 18:03:13.000000000 +0000
> @@ -227,11 +227,8 @@ static inline void orphan_all_buffers(st
>  
>  	mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
>  	if (node->i_private) {
> -		list_for_each_entry(buf, &set->associates, associates) {
> -			down(&buf->sem);
> +		list_for_each_entry(buf, &set->associates, associates)
>  			buf->orphaned = 1;
> -			up(&buf->sem);
> -		}
>  	}
>  	mutex_unlock(&node->i_mutex);
>  }

Hugh, there has been a long discussion among several people concerning 
this issue.  See for example this thread:

http://marc.info/?t=117335935200001&r=1&w=2

and also:

http://marc.info/?l=linux-kernel&m=117355959020831&w=2

The consensus is that we would be better off keeping Oliver's original 
patch without your silly change, and instead fixing the particular method 
call that deadlocked.  Can you please try out the patch below with 
everything else as it was before?  It should solve your problem.

Alan Stern


Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -452,10 +452,39 @@ store_rescan_field (struct device *dev, 
 }
 static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
+/* An attribute method cannot unregister itself, so this workaround for
+ * sdev_store_delete() is necessary.
+ */
+struct sdev_work_struct {
+	struct scsi_device *sdev;
+	struct work_struct work;
+};
+
+static void sdev_store_delete_work(struct work_struct *work)
+{
+	struct sdev_work_struct *sdw = container_of(work,
+			struct sdev_work_struct, work);
+
+	scsi_remove_device(sdw->sdev);
+	scsi_device_put(sdw->sdev);
+	kfree(sdw);
+}
+
 static ssize_t sdev_store_delete(struct device *dev, struct device_attribute *attr, const char *buf,
 				 size_t count)
 {
-	scsi_remove_device(to_scsi_device(dev));
+	struct scsi_device *sdev = to_scsi_device(dev);
+	struct sdev_work_struct *sdw;
+
+	sdw = kmalloc(sizeof(*sdw), GFP_KERNEL);
+	if (!sdw)
+		return -ENOMEM;
+	sdw->sdev = sdev;
+	INIT_WORK(&sdw->work, sdev_store_delete_work);
+	if (scsi_device_get(sdev) != 0)
+		kfree(sdw);
+	else
+		schedule_work(&sdw->work);
 	return count;
 };
 static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 15:00                                             ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
@ 2007-03-13 18:42                                               ` Cornelia Huck
  2007-03-13 21:20                                                 ` Linus Torvalds
  2007-03-13 19:00                                               ` 2.6.21-rc suspend regression: sysfs deadlock Hugh Dickins
  1 sibling, 1 reply; 61+ messages in thread
From: Cornelia Huck @ 2007-03-13 18:42 UTC (permalink / raw)
  To: Alan Stern
  Cc: Hugh Dickins, Dmitry Torokhov, Oliver Neukum, Maneesh Soni,
	gregkh, Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007 11:00:21 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> The consensus is that we would be better off keeping Oliver's original 
> patch without your silly change, and instead fixing the particular method 
> call that deadlocked.

Another call that deadlocked with Oliver's patch is ungroup for s390
ccwgroup devices. It can be made to work again with a similar patch.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

---
 drivers/s390/cio/ccwgroup.c |   35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

--- linux-2.6.orig/drivers/s390/cio/ccwgroup.c
+++ linux-2.6/drivers/s390/cio/ccwgroup.c
@@ -67,22 +67,49 @@ __ccwgroup_remove_symlinks(struct ccwgro
 	
 }
 
+struct ccwgroup_work_struct {
+	struct ccwgroup_device *gdev;
+	struct work_struct work;
+};
+
+static void ccwgroup_ungroup_work(struct work_struct *work)
+{
+	struct ccwgroup_work_struct *ungroup_work
+		= container_of(work, struct ccwgroup_work_struct, work);
+
+	__ccwgroup_remove_symlinks(ungroup_work->gdev);
+	device_unregister(&ungroup_work->gdev->dev);
+	put_device(&ungroup_work->gdev->dev);
+	kfree(ungroup_work);
+}
+
 /*
  * Provide an 'ungroup' attribute so the user can remove group devices no
  * longer needed or accidentially created. Saves memory :)
+ * Note that we cannot unregister the device from one of its attribute
+ * methods, so we have to delay it.
  */
-static ssize_t
-ccwgroup_ungroup_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
+static ssize_t ccwgroup_ungroup_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
 {
 	struct ccwgroup_device *gdev;
+	struct ccwgroup_work_struct *ungroup_work;
 
 	gdev = to_ccwgroupdev(dev);
 
 	if (gdev->state != CCWGROUP_OFFLINE)
 		return -EINVAL;
 
-	__ccwgroup_remove_symlinks(gdev);
-	device_unregister(dev);
+	ungroup_work = kmalloc(sizeof(*ungroup_work), GFP_KERNEL);
+	if (!ungroup_work)
+		return -ENOMEM;
+	ungroup_work->gdev = gdev;
+	INIT_WORK(&ungroup_work->work, ccwgroup_ungroup_work);
+	if (!get_device(&gdev->dev))
+		kfree(ungroup_work);
+	else
+		schedule_work(&ungroup_work->work);
 
 	return count;
 }

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 15:00                                             ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
  2007-03-13 18:42                                               ` Cornelia Huck
@ 2007-03-13 19:00                                               ` Hugh Dickins
  2007-03-13 20:09                                                 ` Alan Stern
  1 sibling, 1 reply; 61+ messages in thread
From: Hugh Dickins @ 2007-03-13 19:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Alan Stern wrote:
> 
> The consensus is that we would be better off keeping Oliver's original 
> patch without your silly change, and instead fixing the particular method 
> call that deadlocked.  Can you please try out the patch below with 
> everything else as it was before?  It should solve your problem.

Yep, it works fine with your patch in and my silly reverted, thanks.
But (I was about to say, even before seeing Cornelia's reply, honest!)
I think you do need to check (audit the source? or is some runtime
check possible?) for other such "suicidal" sysfs files, which
seemed to (sysfs-ignorant) me to pose the real problem.

Hugh

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 19:00                                               ` 2.6.21-rc suspend regression: sysfs deadlock Hugh Dickins
@ 2007-03-13 20:09                                                 ` Alan Stern
  2007-03-13 20:55                                                   ` Hugh Dickins
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-13 20:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Hugh Dickins wrote:

> On Tue, 13 Mar 2007, Alan Stern wrote:
> > 
> > The consensus is that we would be better off keeping Oliver's original 
> > patch without your silly change, and instead fixing the particular method 
> > call that deadlocked.  Can you please try out the patch below with 
> > everything else as it was before?  It should solve your problem.
> 
> Yep, it works fine with your patch in and my silly reverted, thanks.
> But (I was about to say, even before seeing Cornelia's reply, honest!)
> I think you do need to check (audit the source? or is some runtime
> check possible?) for other such "suicidal" sysfs files, which
> seemed to (sysfs-ignorant) me to pose the real problem.

A runtime check wouldn't detect anything until someone tried to use the 
file -- at which point the process would deadlock anyway.

On the other hand, a quick survey of the kernel source shows that
DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
for the faint-of-heart!

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 20:09                                                 ` Alan Stern
@ 2007-03-13 20:55                                                   ` Hugh Dickins
  2007-03-13 21:08                                                     ` Dmitry Torokhov
  2007-03-13 21:20                                                     ` Alan Stern
  0 siblings, 2 replies; 61+ messages in thread
From: Hugh Dickins @ 2007-03-13 20:55 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Alan Stern wrote:
> 
> On the other hand, a quick survey of the kernel source shows that
> DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> for the faint-of-heart!

Indeed, and faint-hearted Hugh wasn't intending to do so: but
stout-hearted Alan will need to, won't he, before his patch can go in?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 20:55                                                   ` Hugh Dickins
@ 2007-03-13 21:08                                                     ` Dmitry Torokhov
  2007-03-13 21:20                                                     ` Alan Stern
  1 sibling, 0 replies; 61+ messages in thread
From: Dmitry Torokhov @ 2007-03-13 21:08 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Stern, Oliver Neukum, Maneesh Soni, gregkh, Richard Purdie,
	James Bottomley, Linus Torvalds, Kernel development list

On 3/13/07, Hugh Dickins <hugh@veritas.com> wrote:
> On Tue, 13 Mar 2007, Alan Stern wrote:
> >
> > On the other hand, a quick survey of the kernel source shows that
> > DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> > for the faint-of-heart!
>
> Indeed, and faint-hearted Hugh wasn't intending to do so: but
> stout-hearted Alan will need to, won't he, before his patch can go in?
>

I think we could rely on subsystems maintainers to let us know if
there are potential problems. For example I can tell that neither
input, serio nor gameport subsystems use sysfs to destroy their
devices (action on sysfs may cause some other device to be destroyed
but that should be ok, only self-destruction is not allowed, right?)

-- 
Dmitry

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 18:42                                               ` Cornelia Huck
@ 2007-03-13 21:20                                                 ` Linus Torvalds
  2007-03-14 16:12                                                   ` Alan Stern
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-03-13 21:20 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alan Stern, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list



On Tue, 13 Mar 2007, Cornelia Huck wrote:
> 
> Another call that deadlocked with Oliver's patch is ungroup for s390
> ccwgroup devices. It can be made to work again with a similar patch.

Could we please make this easier to use by having some common sysfs helper 
routine for this kind of "delayed_store()" functionality.

I'm not a huge fan of delayed work at all, but if we have to have it, at 
least make it one generic function rather than having multiple functions 
all doing their own workqueue logic for it.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 20:55                                                   ` Hugh Dickins
  2007-03-13 21:08                                                     ` Dmitry Torokhov
@ 2007-03-13 21:20                                                     ` Alan Stern
  1 sibling, 0 replies; 61+ messages in thread
From: Alan Stern @ 2007-03-13 21:20 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Hugh Dickins wrote:

> On Tue, 13 Mar 2007, Alan Stern wrote:
> > 
> > On the other hand, a quick survey of the kernel source shows that
> > DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> > for the faint-of-heart!
> 
> Indeed, and faint-hearted Hugh wasn't intending to do so: but
> stout-hearted Alan will need to, won't he, before his patch can go in?

Allow me to point out that the original patch is Oliver's (although I
helped), and it doesn't need to go in -- it needs not to be removed.

Furthermore, I have better things to do with the next month of my time 
than auditing hundreds of routines I don't understand for behavior I 
probably won't be able to recognize.  (Although at 50 a day... hmmm, 
maybe.)

This sounds more like a job for kernel-janitors!


On Tue, 13 Mar 2007, Dmitry Torokhov wrote:

> I think we could rely on subsystems maintainers to let us know if
> there are potential problems. For example I can tell that neither
> input, serio nor gameport subsystems use sysfs to destroy their  
> devices (action on sysfs may cause some other device to be destroyed
> but that should be ok, only self-destruction is not allowed, right?)

Very good points.  USB doesn't do anything like that either.  And right, 
it's okay for a method to destroy other devices; it just can't do anything 
that would lead to its own unregistration.

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 21:20                                                 ` Linus Torvalds
@ 2007-03-14 16:12                                                   ` Alan Stern
  2007-03-14 18:43                                                     ` Cornelia Huck
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-14 16:12 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Tue, 13 Mar 2007, Linus Torvalds wrote:

> Could we please make this easier to use by having some common sysfs helper 
> routine for this kind of "delayed_store()" functionality.
> 
> I'm not a huge fan of delayed work at all, but if we have to have it, at 
> least make it one generic function rather than having multiple functions 
> all doing their own workqueue logic for it.

This seems more elegant (not yet tested).  Cornelia, does it look okay to 
you?

Alan Stern


Index: usb-2.6/include/linux/sysfs.h
===================================================================
--- usb-2.6.orig/include/linux/sysfs.h
+++ usb-2.6/include/linux/sysfs.h
@@ -78,6 +78,9 @@ struct sysfs_ops {
 
 #ifdef CONFIG_SYSFS
 
+extern int sysfs_access_in_other_task(struct kobject *kobj,
+		void (*func)(void *), void *data);
+
 extern int __must_check
 sysfs_create_dir(struct kobject *, struct dentry *);
 
@@ -133,6 +136,12 @@ extern int __must_check sysfs_init(void)
 
 #else /* CONFIG_SYSFS */
 
+static inline int sysfs_access_in_other_task(struct kobject *kobj,
+		void (*func)(void *), void *data)
+{
+	return -ENOSYS;
+}
+
 static inline int sysfs_create_dir(struct kobject * k, struct dentry *shadow)
 {
 	return 0;
Index: usb-2.6/fs/sysfs/file.c
===================================================================
--- usb-2.6.orig/fs/sysfs/file.c
+++ usb-2.6/fs/sysfs/file.c
@@ -643,6 +643,59 @@ void sysfs_remove_file_from_group(struct
 }
 EXPORT_SYMBOL_GPL(sysfs_remove_file_from_group);
 
+struct other_task_struct {
+	struct kobject 		*kobj;
+	void			(*func)(void *);
+	void			*data;
+	struct work_struct	work;
+};
+
+static void other_task_work(struct work_struct *work)
+{
+	struct other_task_struct *ots = container_of(work,
+			struct other_task_struct, work);
+
+	(ots->func)(ots->data);
+	kobject_put(ots->kobj);
+	kfree(ots);
+}
+
+/**
+ * sysfs_access_in_other_task - delay access from an attribute method.
+ * @kobj: object we're acting for.
+ * @func: callback function to invoke later.
+ * @data: argument to pass to @func.
+ *
+ * sysfs attribute methods must not unregister themselves or their parent
+ * kobject (which would amount to the same thing).  Attempts to do so will
+ * deadlock, since unregistration is mutually exclusive with driver
+ * callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to carry out the requested function
+ * in the workqueue's process context.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ */
+int sysfs_access_in_other_task(struct kobject *kobj, void (*func)(void *),
+		void *data)
+{
+	struct other_task_struct *ots;
+
+	ots = kmalloc(sizeof(*ots), GFP_KERNEL);
+	if (!ots)
+		return -ENOMEM;
+	kobject_get(kobj);
+	ots->kobj = kobj;
+	ots->func = func;
+	ots->data = data;
+	INIT_WORK(&ots->work, other_task_work);
+	schedule_work(&ots->work);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sysfs_access_in_other_task);
+
 
 EXPORT_SYMBOL_GPL(sysfs_create_file);
 EXPORT_SYMBOL_GPL(sysfs_remove_file);
Index: usb-2.6/include/linux/device.h
===================================================================
--- usb-2.6.orig/include/linux/device.h
+++ usb-2.6/include/linux/device.h
@@ -356,6 +356,8 @@ extern int __must_check device_create_bi
 					       struct bin_attribute *attr);
 extern void device_remove_bin_file(struct device *dev,
 				   struct bin_attribute *attr);
+extern int device_access_in_other_task(struct device *dev,
+		void (*func)(struct device *));
 
 /* device resource management */
 typedef void (*dr_release_t)(struct device *dev, void *res);
Index: usb-2.6/drivers/base/core.c
===================================================================
--- usb-2.6.orig/drivers/base/core.c
+++ usb-2.6/drivers/base/core.c
@@ -407,6 +407,30 @@ void device_remove_bin_file(struct devic
 }
 EXPORT_SYMBOL_GPL(device_remove_bin_file);
 
+/**
+ * device_access_in_other_task - delay access from an attribute method.
+ * @dev: device.
+ * @func: callback function to invoke later.
+ *
+ * Attribute methods must not unregister themselves or their parent device
+ * (which would amount to the same thing).  Attempts to do so will deadlock,
+ * since unregistration is mutually exclusive with driver callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to carry out the requested function
+ * in the workqueue's process context.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ */
+int device_access_in_other_task(struct device *dev,
+		void (*func)(struct device *))
+{
+	return sysfs_access_in_other_task(&dev->kobj,
+			(void (*)(void *)) func, dev);
+}
+EXPORT_SYMBOL_GPL(device_access_in_other_task);
+
 static void klist_children_get(struct klist_node *n)
 {
 	struct device *dev = container_of(n, struct device, knode_parent);
Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -452,10 +452,22 @@ store_rescan_field (struct device *dev, 
 }
 static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
+static void sdev_store_delete_callback(struct device *dev)
+{
+	scsi_remove_device(to_scsi_device(dev));
+}
+
 static ssize_t sdev_store_delete(struct device *dev, struct device_attribute *attr, const char *buf,
 				 size_t count)
 {
-	scsi_remove_device(to_scsi_device(dev));
+	int rc;
+
+	/* An attribute cannot be unregistered by one of its own methods,
+	 * so we have to use device_access_in_other_task().
+	 */
+	rc = device_access_in_other_task(dev, sdev_store_delete_callback);
+	if (rc)
+		count = rc;
 	return count;
 };
 static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);
Index: usb-2.6/drivers/s390/cio/ccwgroup.c
===================================================================
--- usb-2.6.orig/drivers/s390/cio/ccwgroup.c
+++ usb-2.6/drivers/s390/cio/ccwgroup.c
@@ -71,19 +71,31 @@ __ccwgroup_remove_symlinks(struct ccwgro
  * Provide an 'ungroup' attribute so the user can remove group devices no
  * longer needed or accidentially created. Saves memory :)
  */
+static void ccwgroup_ungroup_callback(struct device *dev)
+{
+	struct ccwgroup_device *gdev = to_ccwgroupdev(dev);
+
+	__ccwgroup_remove_symlinks(gdev);
+	device_unregister(dev);
+}
+
 static ssize_t
 ccwgroup_ungroup_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
 {
 	struct ccwgroup_device *gdev;
+	int rc;
 
 	gdev = to_ccwgroupdev(dev);
 
 	if (gdev->state != CCWGROUP_OFFLINE)
 		return -EINVAL;
 
-	__ccwgroup_remove_symlinks(gdev);
-	device_unregister(dev);
-
+	/* Note that we cannot unregister the device from one of its
+	 * attribute methods, so we have to delay it.
+	 */
+	rc = device_access_in_other_task(dev, ccwgroup_ungroup_callback);
+	if (rc)
+		count = rc;
 	return count;
 }
 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-14 16:12                                                   ` Alan Stern
@ 2007-03-14 18:43                                                     ` Cornelia Huck
  2007-03-14 19:23                                                       ` Alan Stern
  0 siblings, 1 reply; 61+ messages in thread
From: Cornelia Huck @ 2007-03-14 18:43 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Wed, 14 Mar 2007 12:12:37 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> This seems more elegant (not yet tested).  Cornelia, does it look okay to 
> you?

Works for me (grouping & ungrouping ctc) and looks sane. Some more
comments below.


> +struct other_task_struct {
> +	struct kobject 		*kobj;
> +	void			(*func)(void *);
> +	void			*data;
> +	struct work_struct	work;
> +};
> +
> +static void other_task_work(struct work_struct *work)
> +{
> +	struct other_task_struct *ots = container_of(work,
> +			struct other_task_struct, work);
> +
> +	(ots->func)(ots->data);
> +	kobject_put(ots->kobj);
> +	kfree(ots);
> +}

The naming seems a bit unintuitive, but I don't have a good
alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?

> +
> +/**
> + * sysfs_access_in_other_task - delay access from an attribute method.
> + * @kobj: object we're acting for.
> + * @func: callback function to invoke later.
> + * @data: argument to pass to @func.
> + *
> + * sysfs attribute methods must not unregister themselves or their parent
> + * kobject (which would amount to the same thing).  Attempts to do so will
> + * deadlock, since unregistration is mutually exclusive with driver
> + * callbacks.
> + *
> + * Instead methods can call this routine, which will attempt to allocate
> + * and schedule a workqueue request to carry out the requested function
> + * in the workqueue's process context.
> + *
> + * Returns 0 if the request was submitted, -ENOMEM if storage could not
> + * be allocated.
> + */
> +int sysfs_access_in_other_task(struct kobject *kobj, void (*func)(void *),
> +		void *data)

sysfs_delay_access()?


> +/**
> + * device_access_in_other_task - delay access from an attribute method.
> + * @dev: device.
> + * @func: callback function to invoke later.
> + *
> + * Attribute methods must not unregister themselves or their parent device
> + * (which would amount to the same thing).  Attempts to do so will deadlock,
> + * since unregistration is mutually exclusive with driver callbacks.
> + *
> + * Instead methods can call this routine, which will attempt to allocate
> + * and schedule a workqueue request to carry out the requested function
> + * in the workqueue's process context.
> + *
> + * Returns 0 if the request was submitted, -ENOMEM if storage could not
> + * be allocated.
> + */
> +int device_access_in_other_task(struct device *dev,
> +		void (*func)(struct device *))
> +{
> +	return sysfs_access_in_other_task(&dev->kobj,
> +			(void (*)(void *)) func, dev);
> +}
> +EXPORT_SYMBOL_GPL(device_access_in_other_task);

device_delay_access()?


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-14 18:43                                                     ` Cornelia Huck
@ 2007-03-14 19:23                                                       ` Alan Stern
  2007-03-15 10:27                                                         ` Cornelia Huck
  0 siblings, 1 reply; 61+ messages in thread
From: Alan Stern @ 2007-03-14 19:23 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Wed, 14 Mar 2007, Cornelia Huck wrote:

> On Wed, 14 Mar 2007 12:12:37 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> > This seems more elegant (not yet tested).  Cornelia, does it look okay to 
> > you?
> 
> Works for me (grouping & ungrouping ctc) and looks sane. Some more
> comments below.

Thank you.

> > +struct other_task_struct {
> > +	struct kobject 		*kobj;
> > +	void			(*func)(void *);
> > +	void			*data;
> > +	struct work_struct	work;
> > +};
> > +
> > +static void other_task_work(struct work_struct *work)
> > +{
> > +	struct other_task_struct *ots = container_of(work,
> > +			struct other_task_struct, work);
> > +
> > +	(ots->func)(ots->data);
> > +	kobject_put(ots->kobj);
> > +	kfree(ots);
> > +}
> 
> The naming seems a bit unintuitive, but I don't have a good
> alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?

sysfs_work_struct is too generic; other parts of sysfs might also want to
use workqueues for different purposes.

I don't like calling it "delayed"-anything, because the operations aren't
necessarily delayed!  On an SMP system they might even execute before the
sysfs_access_in_other_task() call returns.  (Although the two examples we
have so far can't do that because of lock contention.)

The major feature added here is that the work takes place in a different 
task's context, not that it is delayed.  Hence the choice of names.

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-14 19:23                                                       ` Alan Stern
@ 2007-03-15 10:27                                                         ` Cornelia Huck
  2007-03-15 12:31                                                           ` Hugh Dickins
  2007-03-15 14:27                                                           ` Alan Stern
  0 siblings, 2 replies; 61+ messages in thread
From: Cornelia Huck @ 2007-03-15 10:27 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Wed, 14 Mar 2007 15:23:10 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> > > +struct other_task_struct {
> > > +	struct kobject 		*kobj;
> > > +	void			(*func)(void *);
> > > +	void			*data;
> > > +	struct work_struct	work;
> > > +};
> > > +
> > > +static void other_task_work(struct work_struct *work)
> > > +{
> > > +	struct other_task_struct *ots = container_of(work,
> > > +			struct other_task_struct, work);
> > > +
> > > +	(ots->func)(ots->data);
> > > +	kobject_put(ots->kobj);
> > > +	kfree(ots);
> > > +}
> > 
> > The naming seems a bit unintuitive, but I don't have a good
> > alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?
> 
> sysfs_work_struct is too generic; other parts of sysfs might also want to
> use workqueues for different purposes.

> I don't like calling it "delayed"-anything, because the operations aren't
> necessarily delayed!  On an SMP system they might even execute before the
> sysfs_access_in_other_task() call returns.  (Although the two examples we
> have so far can't do that because of lock contention.)

Sure. But then you shouldn't refer to "delay" in the comments for the
functions as well :)

> The major feature added here is that the work takes place in a different 
> task's context, not that it is delayed.  Hence the choice of names.

Hm. Perhaps device_schedule_access()?

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 10:27                                                         ` Cornelia Huck
@ 2007-03-15 12:31                                                           ` Hugh Dickins
  2007-03-15 13:02                                                             ` Oliver Neukum
  2007-03-15 14:27                                                           ` Alan Stern
  1 sibling, 1 reply; 61+ messages in thread
From: Hugh Dickins @ 2007-03-15 12:31 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alan Stern, Linus Torvalds, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Cornelia Huck wrote:
> On Wed, 14 Mar 2007 15:23:10 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
> > 
> > sysfs_work_struct is too generic; other parts of sysfs might also want to
> > use workqueues for different purposes.
> 
> > I don't like calling it "delayed"-anything, because the operations aren't
> > necessarily delayed!  On an SMP system they might even execute before the
> > sysfs_access_in_other_task() call returns.  (Although the two examples we
> > have so far can't do that because of lock contention.)
> 
> Sure. But then you shouldn't refer to "delay" in the comments for the
> functions as well :)
> 
> > The major feature added here is that the work takes place in a different 
> > task's context, not that it is delayed.  Hence the choice of names.
> 
> Hm. Perhaps device_schedule_access()?

It's really none of my business, I'm merely the reporter the
deadlock being fixed, and I don't know my way around sysfs at all ...

... but I have to say I share your discomfort with Alan's
"sysfs_access_in_other_task" naming, it sounded very weird to me.

Quite apart from this mysterious "other task", I don't understand
"access" either.

Perhaps "defer" would best capture the idea of another-task and
maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?

Hugh

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 12:31                                                           ` Hugh Dickins
@ 2007-03-15 13:02                                                             ` Oliver Neukum
  2007-03-15 13:22                                                               ` Dmitry Torokhov
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-15 13:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Cornelia Huck, Alan Stern, Linus Torvalds, Dmitry Torokhov,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

Am Donnerstag, 15. März 2007 13:31 schrieb Hugh Dickins:
> Quite apart from this mysterious "other task", I don't understand
> "access" either.
> 
> Perhaps "defer" would best capture the idea of another-task and
> maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?

But we do not wish to defer or delay anything.
How about: sysfs_action_from_neutral_context

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 13:02                                                             ` Oliver Neukum
@ 2007-03-15 13:22                                                               ` Dmitry Torokhov
  2007-03-15 13:59                                                                 ` Hugh Dickins
  0 siblings, 1 reply; 61+ messages in thread
From: Dmitry Torokhov @ 2007-03-15 13:22 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hugh Dickins, Cornelia Huck, Alan Stern, Linus Torvalds,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On 3/15/07, Oliver Neukum <oneukum@suse.de> wrote:
> Am Donnerstag, 15. März 2007 13:31 schrieb Hugh Dickins:
> > Quite apart from this mysterious "other task", I don't understand
> > "access" either.
> >
> > Perhaps "defer" would best capture the idea of another-task and
> > maybe-delay? sysfs_defer_work(), struct sysfs_deferred_work?
>
> But we do not wish to defer or delay anything.
> How about: sysfs_action_from_neutral_context
>

How about sysfs_schedule_work? That is what it does - schedules a work
on a sysfs object and everyone here knows what schedule_work() does.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 13:22                                                               ` Dmitry Torokhov
@ 2007-03-15 13:59                                                                 ` Hugh Dickins
  0 siblings, 0 replies; 61+ messages in thread
From: Hugh Dickins @ 2007-03-15 13:59 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Oliver Neukum, Cornelia Huck, Alan Stern, Linus Torvalds,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Dmitry Torokhov wrote:
> 
> How about sysfs_schedule_work? That is what it does - schedules a work
> on a sysfs object and everyone here knows what schedule_work() does.

I'm ashamed to have suggested anything else: certainly gets my vote.

Hugh

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 10:27                                                         ` Cornelia Huck
  2007-03-15 12:31                                                           ` Hugh Dickins
@ 2007-03-15 14:27                                                           ` Alan Stern
  2007-03-15 15:32                                                             ` Cornelia Huck
  2007-03-15 16:29                                                             ` Hugh Dickins
  1 sibling, 2 replies; 61+ messages in thread
From: Alan Stern @ 2007-03-15 14:27 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Cornelia Huck wrote:

> > > The naming seems a bit unintuitive, but I don't have a good
> > > alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?
> > 
> > sysfs_work_struct is too generic; other parts of sysfs might also want to
> > use workqueues for different purposes.
> 
> > I don't like calling it "delayed"-anything, because the operations aren't
> > necessarily delayed!  On an SMP system they might even execute before the
> > sysfs_access_in_other_task() call returns.  (Although the two examples we
> > have so far can't do that because of lock contention.)
> 
> Sure. But then you shouldn't refer to "delay" in the comments for the
> functions as well :)

Fair enough.  One use of "delay" is in a comment you wrote; I'll change it 
as well.

> > The major feature added here is that the work takes place in a different 
> > task's context, not that it is delayed.  Hence the choice of names.
> 
> Hm. Perhaps device_schedule_access()?

On Thu, 15 Mar 2007, Hugh Dickins wrote:

> It's really none of my business, I'm merely the reporter the
> deadlock being fixed, and I don't know my way around sysfs at all ...
> 
> ... but I have to say I share your discomfort with Alan's
> "sysfs_access_in_other_task" naming, it sounded very weird to me.
> 
> Quite apart from this mysterious "other task", I don't understand
> "access" either.
> 
> Perhaps "defer" would best capture the idea of another-task and
> maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?  

On Thu, 15 Mar 2007, Oliver Neukum wrote:

> But we do not wish to defer or delay anything.
> How about: sysfs_action_from_neutral_context  

On Thu, 15 Mar 2007, Dmitry Torokhov wrote:

> How about sysfs_schedule_work? That is what it does - schedules a work
> on a sysfs object and everyone here knows what schedule_work() does.  

On Thu, 15 Mar 2007, Hugh Dickins wrote:

> I'm ashamed to have suggested anything else: certainly gets my vote.

Personally I don't understand what was wrong with my name.  What's weird 
or unintuitive about doing something in a different task's context?

Dmitry's suggestion is slightly inappropriate because the function doesn't
take a workstruct as an argument and it isn't itself a workqueue callback.  

Would people be happier with sysfs_schedule_callback() and
device_schedule_callback()?  At least the functions do take a callback 
pointer as an argument, even though they aren't callbacks themselves.

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 14:27                                                           ` Alan Stern
@ 2007-03-15 15:32                                                             ` Cornelia Huck
  2007-03-15 16:29                                                             ` Hugh Dickins
  1 sibling, 0 replies; 61+ messages in thread
From: Cornelia Huck @ 2007-03-15 15:32 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007 10:27:19 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> Fair enough.  One use of "delay" is in a comment you wrote; I'll change it 
> as well.

Fine with me.

> Would people be happier with sysfs_schedule_callback() and
> device_schedule_callback()?  At least the functions do take a callback 
> pointer as an argument, even though they aren't callbacks themselves.

Count one happy person here.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 14:27                                                           ` Alan Stern
  2007-03-15 15:32                                                             ` Cornelia Huck
@ 2007-03-15 16:29                                                             ` Hugh Dickins
  2007-03-15 16:51                                                               ` Linus Torvalds
  1 sibling, 1 reply; 61+ messages in thread
From: Hugh Dickins @ 2007-03-15 16:29 UTC (permalink / raw)
  To: Alan Stern
  Cc: Cornelia Huck, Linus Torvalds, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Alan Stern wrote:
> 
> Personally I don't understand what was wrong with my name.  What's weird 
> or unintuitive about doing something in a different task's context?

The only thing wrong with sysfs_do_something_in_a_different_task_context()
is the length of the name.  "do", that's good, much better than "access".

sysfs_access_in_other_task() left me wondering what this "other" task
was, and what kind of "access" it's trying to get - or is the calling
task the other, and it's trying to access something it wouldn't
directly have access to?

> 
> Dmitry's suggestion is slightly inappropriate because the function doesn't
> take a workstruct as an argument and it isn't itself a workqueue callback.  

True, though since he's saying "work" rather than "workstruct",
I was okay with that: it's a sysfs wrapper to schedule_work().

> 
> Would people be happier with sysfs_schedule_callback() and
> device_schedule_callback()?  At least the functions do take a callback 
> pointer as an argument, even though they aren't callbacks themselves.

A lot happier than with sysfs_access_in_other_task() -
if you prefer this to Dmitry's, it's okay by me.

Hugh

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 16:29                                                             ` Hugh Dickins
@ 2007-03-15 16:51                                                               ` Linus Torvalds
  2007-03-15 19:50                                                                 ` [PATCH] sysfs and driver core: add callback helper, used by SCSI and S390 Alan Stern
  2007-03-15 19:51                                                                 ` [PATCH] sysfs: reinstate exclusion between method calls and attribute unregistration Alan Stern
  0 siblings, 2 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-03-15 16:51 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Stern, Cornelia Huck, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list



On Thu, 15 Mar 2007, Hugh Dickins wrote:
> 
> sysfs_access_in_other_task() left me wondering what this "other" task
> was, and what kind of "access" it's trying to get - or is the calling
> task the other, and it's trying to access something it wouldn't
> directly have access to?

For naming clashes, I'd suggest:

 - try to name according to *why* something is done, not necessarily what 
   it does.

   For example, is it really in "another task"? Maybe it's just an 
   on-demand thread of the same task?  Do you actually care how the 
   deferred work is done?

 - avoid being vague. I agree with not liking the name much, and the 
   "other" thing bothers me. Like Hugh, it makes me ask "_What_ other 
   task?"

So I would suggest not concentrating on some implementation issue, but on 
the reason why you need it in the first place. Namely that you want to 
defer the actual action to avoid deadlock due to recursive locking. So 
that "why do I actually do this" thing implies something like 
"sysfs_store_async()" or "sysfs_store_deferred()" or maybe actually 
concentrate on the locking angle and say something like 
"sysfs_store_needs_to_reacquire_lock()".

(That last one wasn't really serious - it's too long and cumbersome, but 
it's an example of not caring _how_ you do it, just abotu what you want 
done).

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH] sysfs and driver core: add callback helper, used by SCSI and S390
  2007-03-15 16:51                                                               ` Linus Torvalds
@ 2007-03-15 19:50                                                                 ` Alan Stern
  2007-03-15 19:51                                                                 ` [PATCH] sysfs: reinstate exclusion between method calls and attribute unregistration Alan Stern
  1 sibling, 0 replies; 61+ messages in thread
From: Alan Stern @ 2007-03-15 19:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Cornelia Huck, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	SCSI development list, linux-s390, Kernel development list

This patch (as868) adds a helper routine for device drivers that need
to set up a callback to perform some action in a different process's
context.  This is intended for use by attribute methods that want to
unregister themselves or their parent device.  Attribute method calls
are mutually exclusive with unregistration, so such actions cannot be
taken directly.

Two attribute methods are converted to use the new helper routine: one
for SCSI device deletion and one for System/390 ccwgroup devices.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

---

Index: 2.6.21-rc3-git9/include/linux/sysfs.h
===================================================================
--- 2.6.21-rc3-git9.orig/include/linux/sysfs.h
+++ 2.6.21-rc3-git9/include/linux/sysfs.h
@@ -78,6 +78,9 @@ struct sysfs_ops {
 
 #ifdef CONFIG_SYSFS
 
+extern int sysfs_schedule_callback(struct kobject *kobj,
+		void (*func)(void *), void *data);
+
 extern int __must_check
 sysfs_create_dir(struct kobject *, struct dentry *);
 
@@ -132,6 +135,12 @@ extern int __must_check sysfs_init(void)
 
 #else /* CONFIG_SYSFS */
 
+static inline int sysfs_schedule_callback(struct kobject *kobj,
+		void (*func)(void *), void *data)
+{
+	return -ENOSYS;
+}
+
 static inline int sysfs_create_dir(struct kobject * k, struct dentry *shadow)
 {
 	return 0;
Index: 2.6.21-rc3-git9/fs/sysfs/file.c
===================================================================
--- 2.6.21-rc3-git9.orig/fs/sysfs/file.c
+++ 2.6.21-rc3-git9/fs/sysfs/file.c
@@ -629,6 +629,60 @@ void sysfs_remove_file_from_group(struct
 }
 EXPORT_SYMBOL_GPL(sysfs_remove_file_from_group);
 
+struct sysfs_schedule_callback_struct {
+	struct kobject 		*kobj;
+	void			(*func)(void *);
+	void			*data;
+	struct work_struct	work;
+};
+
+static void sysfs_schedule_callback_work(struct work_struct *work)
+{
+	struct sysfs_schedule_callback_struct *ss = container_of(work,
+			struct sysfs_schedule_callback_struct, work);
+
+	(ss->func)(ss->data);
+	kobject_put(ss->kobj);
+	kfree(ss);
+}
+
+/**
+ * sysfs_schedule_callback - helper to schedule a callback for a kobject
+ * @kobj: object we're acting for.
+ * @func: callback function to invoke later.
+ * @data: argument to pass to @func.
+ *
+ * sysfs attribute methods must not unregister themselves or their parent
+ * kobject (which would amount to the same thing).  Attempts to do so will
+ * deadlock, since unregistration is mutually exclusive with driver
+ * callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to call back @func with @data as its
+ * argument in the workqueue's process context.  @kobj will be pinned
+ * until @func returns.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ */
+int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *),
+		void *data)
+{
+	struct sysfs_schedule_callback_struct *ss;
+
+	ss = kmalloc(sizeof(*ss), GFP_KERNEL);
+	if (!ss)
+		return -ENOMEM;
+	kobject_get(kobj);
+	ss->kobj = kobj;
+	ss->func = func;
+	ss->data = data;
+	INIT_WORK(&ss->work, sysfs_schedule_callback_work);
+	schedule_work(&ss->work);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sysfs_schedule_callback);
+
 
 EXPORT_SYMBOL_GPL(sysfs_create_file);
 EXPORT_SYMBOL_GPL(sysfs_remove_file);
Index: 2.6.21-rc3-git9/include/linux/device.h
===================================================================
--- 2.6.21-rc3-git9.orig/include/linux/device.h
+++ 2.6.21-rc3-git9/include/linux/device.h
@@ -353,6 +353,8 @@ extern int __must_check device_create_bi
 					       struct bin_attribute *attr);
 extern void device_remove_bin_file(struct device *dev,
 				   struct bin_attribute *attr);
+extern int device_schedule_callback(struct device *dev,
+		void (*func)(struct device *));
 
 /* device resource management */
 typedef void (*dr_release_t)(struct device *dev, void *res);
Index: 2.6.21-rc3-git9/drivers/base/core.c
===================================================================
--- 2.6.21-rc3-git9.orig/drivers/base/core.c
+++ 2.6.21-rc3-git9/drivers/base/core.c
@@ -407,6 +407,35 @@ void device_remove_bin_file(struct devic
 }
 EXPORT_SYMBOL_GPL(device_remove_bin_file);
 
+/**
+ * device_schedule_callback - helper to schedule a callback for a device
+ * @dev: device.
+ * @func: callback function to invoke later.
+ *
+ * Attribute methods must not unregister themselves or their parent device
+ * (which would amount to the same thing).  Attempts to do so will deadlock,
+ * since unregistration is mutually exclusive with driver callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to call back @func with @dev as its
+ * argument in the workqueue's process context.  @dev will be pinned until
+ * @func returns.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ *
+ * NOTE: This routine won't work if CONFIG_SYSFS isn't set!  It uses an
+ * underlying sysfs routine (since it is intended for use by attribute
+ * methods), and if sysfs isn't available you'll get nothing but -ENOSYS.
+ */
+int device_schedule_callback(struct device *dev,
+		void (*func)(struct device *))
+{
+	return sysfs_schedule_callback(&dev->kobj,
+			(void (*)(void *)) func, dev);
+}
+EXPORT_SYMBOL_GPL(device_schedule_callback);
+
 static void klist_children_get(struct klist_node *n)
 {
 	struct device *dev = container_of(n, struct device, knode_parent);
Index: 2.6.21-rc3-git9/drivers/scsi/scsi_sysfs.c
===================================================================
--- 2.6.21-rc3-git9.orig/drivers/scsi/scsi_sysfs.c
+++ 2.6.21-rc3-git9/drivers/scsi/scsi_sysfs.c
@@ -452,10 +452,22 @@ store_rescan_field (struct device *dev, 
 }
 static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
+static void sdev_store_delete_callback(struct device *dev)
+{
+	scsi_remove_device(to_scsi_device(dev));
+}
+
 static ssize_t sdev_store_delete(struct device *dev, struct device_attribute *attr, const char *buf,
 				 size_t count)
 {
-	scsi_remove_device(to_scsi_device(dev));
+	int rc;
+
+	/* An attribute cannot be unregistered by one of its own methods,
+	 * so we have to use this roundabout approach.
+	 */
+	rc = device_schedule_callback(dev, sdev_store_delete_callback);
+	if (rc)
+		count = rc;
 	return count;
 };
 static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);
Index: 2.6.21-rc3-git9/drivers/s390/cio/ccwgroup.c
===================================================================
--- 2.6.21-rc3-git9.orig/drivers/s390/cio/ccwgroup.c
+++ 2.6.21-rc3-git9/drivers/s390/cio/ccwgroup.c
@@ -71,19 +71,31 @@ __ccwgroup_remove_symlinks(struct ccwgro
  * Provide an 'ungroup' attribute so the user can remove group devices no
  * longer needed or accidentially created. Saves memory :)
  */
+static void ccwgroup_ungroup_callback(struct device *dev)
+{
+	struct ccwgroup_device *gdev = to_ccwgroupdev(dev);
+
+	__ccwgroup_remove_symlinks(gdev);
+	device_unregister(dev);
+}
+
 static ssize_t
 ccwgroup_ungroup_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
 {
 	struct ccwgroup_device *gdev;
+	int rc;
 
 	gdev = to_ccwgroupdev(dev);
 
 	if (gdev->state != CCWGROUP_OFFLINE)
 		return -EINVAL;
 
-	__ccwgroup_remove_symlinks(gdev);
-	device_unregister(dev);
-
+	/* Note that we cannot unregister the device from one of its
+	 * attribute methods, so we have to use this roundabout approach.
+	 */
+	rc = device_schedule_callback(dev, ccwgroup_ungroup_callback);
+	if (rc)
+		count = rc;
 	return count;
 }
 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH] sysfs: reinstate exclusion between method calls and attribute unregistration
  2007-03-15 16:51                                                               ` Linus Torvalds
  2007-03-15 19:50                                                                 ` [PATCH] sysfs and driver core: add callback helper, used by SCSI and S390 Alan Stern
@ 2007-03-15 19:51                                                                 ` Alan Stern
  1 sibling, 0 replies; 61+ messages in thread
From: Alan Stern @ 2007-03-15 19:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Cornelia Huck, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

This patch (as869) reinstates the mutual exclusion between sysfs
attribute method calls and attribute unregistration.  The
previously-reported deadlocks have been fixed, and this exclusion is
by far the simplest way to avoid races during driver unbinding.

The check for orphaned read-buffers has been moved down slightly, so
that the remainder of a partially-read buffer will still be available
to userspace even after the attribute has been unregistered.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

---

Index: 2.6.21-rc3-git9/fs/sysfs/inode.c
===================================================================
--- 2.6.21-rc3-git9.orig/fs/sysfs/inode.c
+++ 2.6.21-rc3-git9/fs/sysfs/inode.c
@@ -222,13 +222,17 @@ const unsigned char * sysfs_get_name(str
 
 static inline void orphan_all_buffers(struct inode *node)
 {
-	struct sysfs_buffer_collection *set = node->i_private;
+	struct sysfs_buffer_collection *set;
 	struct sysfs_buffer *buf;
 
 	mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
-	if (node->i_private) {
-		list_for_each_entry(buf, &set->associates, associates)
+	set = node->i_private;
+	if (set) {
+		list_for_each_entry(buf, &set->associates, associates) {
+			down(&buf->sem);
 			buf->orphaned = 1;
+			up(&buf->sem);
+		}
 	}
 	mutex_unlock(&node->i_mutex);
 }
Index: 2.6.21-rc3-git9/fs/sysfs/file.c
===================================================================
--- 2.6.21-rc3-git9.orig/fs/sysfs/file.c
+++ 2.6.21-rc3-git9/fs/sysfs/file.c
@@ -168,12 +168,12 @@ sysfs_read_file(struct file *file, char 
 	ssize_t retval = 0;
 
 	down(&buffer->sem);
-	if (buffer->orphaned) {
-		retval = -ENODEV;
-		goto out;
-	}
 	if (buffer->needs_read_fill) {
-		if ((retval = fill_read_buffer(file->f_path.dentry,buffer)))
+		if (buffer->orphaned)
+			retval = -ENODEV;
+		else
+			retval = fill_read_buffer(file->f_path.dentry,buffer);
+		if (retval)
 			goto out;
 	}
 	pr_debug("%s: count = %zd, ppos = %lld, buf = %s\n",


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
@ 2007-03-10 20:44 Alan Stern
  0 siblings, 0 replies; 61+ messages in thread
From: Alan Stern @ 2007-03-10 20:44 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Linus Torvalds, Dmitry Torokhov, Maneesh Soni, gregkh,
	James Bottomley, Kernel development list

[For the start of this thread, see 
<http://marc.theaimsgroup.com/?l=linux-kernel&m=117320893726621&w=2>.]

On Wed, 7 Mar 2007, Linus Torvalds wrote:

> So you just pointed to *another* data structure that apparently violates 
> the "you MUST use refcounting" rule.
> 
> What is it with you people? It's really simple. Data structures must be 
> refcounted if you can reach them two different ways.
> 
> If you don't use refcounting, then you'd better make sure that the data 
> can be reached only one way (for example, by *not* exposing it for sysfs).
> 
> It really *is* that simple. Read the CodingStyle rules.

Linus's analysis is correct as far as it goes, but it misses some very 
important points.  The _real_ problem here, which nobody has pointed out 
so far, is not device removal or driver unloading.  It is driver 
unbinding -- with its consequent issue of access rights.

When a driver is unbound from a device, when should the driver stop trying 
to access that device?  The obvious answer is that it must stop before its 
release() method returns.  Otherwise the device might get bound to 
another driver and we would have both drivers trying to talk to it at the 
same time.

In other words, when a driver unbinds from a device, it loses its right to
access that device.  Same goes for any device-related data structures that
weren't created by the driver itself.  When you realize this, it becomes
obvious that the driver faces a synchronization problem.  All its entry
points must be synchronized with release(), to avoid races.

So there actually are two things a driver has to worry about:

	The lifetime of its private data structures (which can be solved
	using refcounts as Linus advocated);

	The race between release() and other activities (which cannot
	be solved by refcounts but needs a true synchronization technique,
	such as a mutex).

No doubt some of this sounds familiar; the race between open() and
disconnect() for char device drivers is one we have faced many times and
not always solved perfectly.  Also note that this is a fundamental
problem, affecting many facilities in addition to sysfs.


One way to solve these problems is to put all the responsibility on the 
driver.  Make it refcount its data structures and use mutexes.  This is 
not very attractive for several reasons:

	_Lots_ of drivers are affected.  Pretty much any driver which
	registers a char device or a sysfs attribute file.

	_Lots_ of code would need to be changed, adding considerable
	bloat.  Every show()/store() method would need to acquire a mutex,
	and many would need to be passed an additional argument, requiring
	a change in the sysfs API.  (I can explain why in a follow-up 
	email if anyone is interested.)

	Most importantly, doing all the refcounting and mutual exclusion
	correctly is quite hard.  It's amazingly easy to make mistakes
	in these areas.  The chances of getting it right while changing
	multiple functions in every single driver are infinitesimal.

Another approach is to put all the responsibility on the core subsystems
that handle driver registration.  They should enforce rigidly two
principles: "No driver callbacks occur after unregistration" and its
prerequisite, "Unregistration is mutually exclusive with driver
callbacks".  (This is exactly what Oliver's original patch did for sysfs.)

	The number of core subsystems affected is much smaller than the
	total number of drivers.  Sysfs, debugfs, the char device
	subsystem, maybe a few others.

	Drivers would no longer have to worry about doing their own
	synchronization or refcounts.  It would be guaranteed that a
	private data structure would never be accessed from sysfs after
	device_remove_file() returned, so the structure could safely and
	easily be deallocated as part of release().

At the expense of complicating a few central subsystems, we could simplify
a lot of drivers.  I think this is a worthwhile tradeoff.

It does have a small disadvantage; it means that an entry point would
deadlock if it tried to unregister itself.  (The example which started
this whole thread was sdev_store_delete() in the SCSI core.  Writing to
that attribute unregisters the device to which it belongs.)  Clearly the
actual unregistration would have to performed separately in a workqueue.  
I think the number of places where this occurs is pretty small.


It's true that this approach goes against the general philosophy used
elsewhere in the kernel.  Refcounting without synchronization is the
general rule.

However unbinding is a special case.  Normally with refcounting, it
doesn't matter when a driver tries to read or write a data structure.  So
long as the driver still holds a reference, the data will be there and the
access will be okay.

But not with unbinding!  After unbinding, the data will still be there but 
it might be owned by another driver.  Even worse, instead of just 
accessing data the code might try to access the device.

The basic assumption behind the refcounting approach is that a resource 
will be used for one purpose and then discarded, so accessing it will 
always be okay.  Driver unbinding violates this assumption; the devices 
and data structures that a driver binds to can be bound, unbound, and 
rebound multiple times.  Simple refcounting isn't sufficient to handle the 
situation.


In short, I think Oliver's original patch should be reinstated.  
(sdev_store_delete() can easily be rewritten to use a workqueue.)  Not
only that, it should be exanded upon.  For instance, it handles regular
sysfs files but it ignores binary files -- a clear oversight.  Debugfs and
the char device subsystem should be modified similarly.  Maybe also
procfs, and perhaps others.

Alan Stern


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07 18:02         ` Linus Torvalds
@ 2007-03-07 18:16           ` Oliver Neukum
  0 siblings, 0 replies; 61+ messages in thread
From: Oliver Neukum @ 2007-03-07 18:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dmitry Torokhov, Hugh Dickins, Maneesh Soni, Greg Kroah-Hartman,
	Adrian Bunk, linux-kernel

Am Mittwoch, 7. März 2007 19:02 schrieb Linus Torvalds:
> 
> On Wed, 7 Mar 2007, Oliver Neukum wrote:
> >
> > The problem also exists with unplugging devices. Drivers get no feedback
> > to tell them when it is safe to free the data structures associated with
> > an attribute.
> 
> So you just pointed to *another* data structure that apparently violates 
> the "you MUST use refcounting" rule.
> 
> What is it with you people? It's really simple. Data structures must be 
> refcounted if you can reach them two different ways.
> 
> If you don't use refcounting, then you'd better make sure that the data 
> can be reached only one way (for example, by *not* exposing it for sysfs).
> 
> It really *is* that simple. Read the CodingStyle rules.

Very well, there seems to be no clean way to avoid that work.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07 16:59       ` Oliver Neukum
@ 2007-03-07 18:02         ` Linus Torvalds
  2007-03-07 18:16           ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-03-07 18:02 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Dmitry Torokhov, Hugh Dickins, Oliver Neukum, Maneesh Soni,
	Greg Kroah-Hartman, Adrian Bunk, linux-kernel



On Wed, 7 Mar 2007, Oliver Neukum wrote:
>
> The problem also exists with unplugging devices. Drivers get no feedback
> to tell them when it is safe to free the data structures associated with
> an attribute.

So you just pointed to *another* data structure that apparently violates 
the "you MUST use refcounting" rule.

What is it with you people? It's really simple. Data structures must be 
refcounted if you can reach them two different ways.

If you don't use refcounting, then you'd better make sure that the data 
can be reached only one way (for example, by *not* exposing it for sysfs).

It really *is* that simple. Read the CodingStyle rules.

		Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07 16:52     ` Linus Torvalds
@ 2007-03-07 16:59       ` Oliver Neukum
  2007-03-07 18:02         ` Linus Torvalds
  0 siblings, 1 reply; 61+ messages in thread
From: Oliver Neukum @ 2007-03-07 16:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dmitry Torokhov, Hugh Dickins, Oliver Neukum, Maneesh Soni,
	Greg Kroah-Hartman, Adrian Bunk, linux-kernel

Am Mittwoch, 7. März 2007 17:52 schrieb Linus Torvalds:
> 
> On Wed, 7 Mar 2007, Dmitry Torokhov wrote:
> > 
> > ... with the exception that it will again make data associated with
> > sysfs attributes accessible past the point of returning from
> > sysfs_remove_file. And that was the point so drivers would not have to
> > care about handling access to extra data (such as static strings) past
> > the driver unload.
> 
> Drivers are unloaded by stopping the whole machine (exactly because module 
> unload is otherwise so hard to handle), so that never happens unless you 
> actively block. In other words, if you do something as simple as

The problem also exists with unplugging devices. Drivers get no feedback
to tell them when it is safe to free the data structures associated with
an attribute.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07 15:56   ` Dmitry Torokhov
@ 2007-03-07 16:52     ` Linus Torvalds
  2007-03-07 16:59       ` Oliver Neukum
  0 siblings, 1 reply; 61+ messages in thread
From: Linus Torvalds @ 2007-03-07 16:52 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Hugh Dickins, Oliver Neukum, Maneesh Soni, Greg Kroah-Hartman,
	Adrian Bunk, linux-kernel



On Wed, 7 Mar 2007, Dmitry Torokhov wrote:
> 
> ... with the exception that it will again make data associated with
> sysfs attributes accessible past the point of returning from
> sysfs_remove_file. And that was the point so drivers would not have to
> care about handling access to extra data (such as static strings) past
> the driver unload.

Drivers are unloaded by stopping the whole machine (exactly because module 
unload is otherwise so hard to handle), so that never happens unless you 
actively block. In other words, if you do something as simple as

	if (inode->i_private_data)
		sysfs_flush_buffer(buffer);

then there is no race with unloading (unless the driver itself does 
something stupid, of course - but the whole point of having a kernel 
buffer is so that it does *not* have to make user accesses etc).

But the one thing you should *not* do is to depend on a sleeping lock, 
because that breaks the whole model!

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07  1:56 ` Linus Torvalds
  2007-03-07 14:38   ` Oliver Neukum
@ 2007-03-07 15:56   ` Dmitry Torokhov
  2007-03-07 16:52     ` Linus Torvalds
  1 sibling, 1 reply; 61+ messages in thread
From: Dmitry Torokhov @ 2007-03-07 15:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Oliver Neukum, Maneesh Soni, Greg Kroah-Hartman,
	Adrian Bunk, linux-kernel

On 3/6/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>  - removing the buffer is now just
>
>        mutex_lock(&inode->i_mutex);
>        buffer = inode->i_private;
>        inode->i_private = NULL;
>        mutex_unlock(&inode->i_mutex);
>
>        put_sysfs_buffer(buffer);
>
>  - everybody is happy!
>

... with the exception that it will again make data associated with
sysfs attributes accessible past the point of returning from
sysfs_remove_file. And that was the point so drivers would not have to
care about handling access to extra data (such as static strings) past
the driver unload.

I wonder if we should keep Oliver's change and require attribute
implementations to offload "delete me" kind of actions to workqueues.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-07  1:56 ` Linus Torvalds
@ 2007-03-07 14:38   ` Oliver Neukum
  2007-03-07 15:56   ` Dmitry Torokhov
  1 sibling, 0 replies; 61+ messages in thread
From: Oliver Neukum @ 2007-03-07 14:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Maneesh Soni, Greg Kroah-Hartman, Adrian Bunk,
	linux-kernel

Am Mittwoch, 7. März 2007 02:56 schrieb Linus Torvalds:
> Anyway, I'm unable to revert the broken commit, since there are now other 
> changes that depend on it, but can somebody *please* do that? I'll apply 
> Hugh's silly patch in the meantime, just to avoid the lockup.

As you like it. This patch reverts it.

	Regards
		Oliver

Signed-off-by: Oliver Neukum <oliver@neukum.name>
-----

--- orig/fs/sysfs/inode.c	2007-03-07 10:49:42.000000000 +0100
+++ linux-2.6.21-rc3/fs/sysfs/inode.c	2007-03-07 10:52:56.000000000 +0100
@@ -220,20 +220,6 @@
 	return NULL;
 }
 
-static inline void orphan_all_buffers(struct inode *node)
-{
-	struct sysfs_buffer_collection *set = node->i_private;
-	struct sysfs_buffer *buf;
-
-	mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
-	if (node->i_private) {
-		list_for_each_entry(buf, &set->associates, associates)
-			buf->orphaned = 1;
-	}
-	mutex_unlock(&node->i_mutex);
-}
-
-
 /*
  * Unhashes the dentry corresponding to given sysfs_dirent
  * Called with parent inode's i_mutex held.
@@ -241,23 +227,16 @@
 void sysfs_drop_dentry(struct sysfs_dirent * sd, struct dentry * parent)
 {
 	struct dentry * dentry = sd->s_dentry;
-	struct inode *inode;
 
 	if (dentry) {
 		spin_lock(&dcache_lock);
 		spin_lock(&dentry->d_lock);
 		if (!(d_unhashed(dentry) && dentry->d_inode)) {
-			inode = dentry->d_inode;
-			spin_lock(&inode->i_lock);
-			__iget(inode);
-			spin_unlock(&inode->i_lock);
 			dget_locked(dentry);
 			__d_drop(dentry);
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
 			simple_unlink(parent->d_inode, dentry);
-			orphan_all_buffers(inode);
-			iput(inode);
 		} else {
 			spin_unlock(&dentry->d_lock);
 			spin_unlock(&dcache_lock);
--- orig/fs/sysfs/file.c	2007-03-07 10:37:28.000000000 +0100
+++ linux-2.6.21-rc3/fs/sysfs/file.c	2007-03-07 10:54:00.000000000 +0100
@@ -7,7 +7,6 @@
 #include <linux/kobject.h>
 #include <linux/namei.h>
 #include <linux/poll.h>
-#include <linux/list.h>
 #include <asm/uaccess.h>
 #include <asm/semaphore.h>
 
@@ -52,30 +51,6 @@
 };
 
 /**
- *	add_to_collection - add buffer to a collection
- *	@buffer:	buffer to be added
- *	@node:		inode of set to add to
- */
-
-static inline void
-add_to_collection(struct sysfs_buffer *buffer, struct inode *node)
-{
-	struct sysfs_buffer_collection *set = node->i_private;
-
-	mutex_lock(&node->i_mutex);
-	list_add(&buffer->associates, &set->associates);
-	mutex_unlock(&node->i_mutex);
-}
-
-static inline void
-remove_from_collection(struct sysfs_buffer *buffer, struct inode *node)
-{
-	mutex_lock(&node->i_mutex);
-	list_del(&buffer->associates);
-	mutex_unlock(&node->i_mutex);
-}
-
-/**
  *	fill_read_buffer - allocate and fill buffer from object.
  *	@dentry:	dentry pointer.
  *	@buffer:	data buffer for file.
@@ -168,10 +143,6 @@
 	ssize_t retval = 0;
 
 	down(&buffer->sem);
-	if (buffer->orphaned) {
-		retval = -ENODEV;
-		goto out;
-	}
 	if (buffer->needs_read_fill) {
 		if ((retval = fill_read_buffer(file->f_path.dentry,buffer)))
 			goto out;
@@ -261,16 +232,11 @@
 	ssize_t len;
 
 	down(&buffer->sem);
-	if (buffer->orphaned) {
-		len = -ENODEV;
-		goto out;
-	}
 	len = fill_write_buffer(buffer, buf, count);
 	if (len > 0)
 		len = flush_write_buffer(file->f_path.dentry, buffer, len);
 	if (len > 0)
 		*ppos += len;
-out:
 	up(&buffer->sem);
 	return len;
 }
@@ -279,7 +245,6 @@
 {
 	struct kobject *kobj = sysfs_get_kobject(file->f_path.dentry->d_parent);
 	struct attribute * attr = to_attr(file->f_path.dentry);
-	struct sysfs_buffer_collection *set;
 	struct sysfs_buffer * buffer;
 	struct sysfs_ops * ops = NULL;
 	int error = 0;
@@ -309,18 +274,6 @@
 	if (!ops)
 		goto Eaccess;
 
-	/* make sure we have a collection to add our buffers to */
-	mutex_lock(&inode->i_mutex);
-	if (!(set = inode->i_private)) {
-		if (!(set = inode->i_private = kmalloc(sizeof(struct sysfs_buffer_collection), GFP_KERNEL))) {
-			error = -ENOMEM;
-			goto Done;
-		} else {
-			INIT_LIST_HEAD(&set->associates);
-		}
-	}
-	mutex_unlock(&inode->i_mutex);
-
 	/* File needs write support.
 	 * The inode's perms must say it's ok, 
 	 * and we must have a store method.
@@ -346,11 +299,9 @@
 	 */
 	buffer = kzalloc(sizeof(struct sysfs_buffer), GFP_KERNEL);
 	if (buffer) {
-		INIT_LIST_HEAD(&buffer->associates);
 		init_MUTEX(&buffer->sem);
 		buffer->needs_read_fill = 1;
 		buffer->ops = ops;
-		add_to_collection(buffer, inode);
 		file->private_data = buffer;
 	} else
 		error = -ENOMEM;
@@ -375,8 +326,6 @@
 	struct module * owner = attr->owner;
 	struct sysfs_buffer * buffer = filp->private_data;
 
-	if (buffer)
-		remove_from_collection(buffer, inode);
 	kobject_put(kobj);
 	/* After this point, attr should not be accessed. */
 	module_put(owner);
--- orig/fs/sysfs/mount.c	2007-03-07 10:37:50.000000000 +0100
+++ linux-2.6.21-rc3/fs/sysfs/mount.c	2007-03-07 10:38:14.000000000 +0100
@@ -19,12 +19,9 @@
 struct super_block * sysfs_sb = NULL;
 struct kmem_cache *sysfs_dir_cachep;
 
-static void sysfs_clear_inode(struct inode *inode);
-
 static const struct super_operations sysfs_ops = {
 	.statfs		= simple_statfs,
 	.drop_inode	= sysfs_delete_inode,
-	.clear_inode	= sysfs_clear_inode,
 };
 
 static struct sysfs_dirent sysfs_root = {
@@ -35,11 +32,6 @@
 	.s_iattr	= NULL,
 };
 
-static void sysfs_clear_inode(struct inode *inode)
-{
-	kfree(inode->i_private);
-}
-
 static int sysfs_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct inode *inode;
--- orig/fs/sysfs/sysfs.h	2007-03-07 10:38:44.000000000 +0100
+++ linux-2.6.21-rc3/fs/sysfs/sysfs.h	2007-03-07 10:39:26.000000000 +0100
@@ -46,21 +46,15 @@
 };
 
 struct sysfs_buffer {
-	struct list_head		associates;
 	size_t				count;
 	loff_t				pos;
 	char				* page;
 	struct sysfs_ops		* ops;
 	struct semaphore		sem;
-	int				orphaned;
 	int				needs_read_fill;
 	int				event;
 };
 
-struct sysfs_buffer_collection {
-	struct list_head	associates;
-};
-
 static inline struct kobject * to_kobj(struct dentry * dentry)
 {
 	struct sysfs_dirent * sd = dentry->d_fsdata;

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-06 19:20 Hugh Dickins
  2007-03-06 20:16 ` Oliver Neukum
@ 2007-03-07  1:56 ` Linus Torvalds
  2007-03-07 14:38   ` Oliver Neukum
  2007-03-07 15:56   ` Dmitry Torokhov
  1 sibling, 2 replies; 61+ messages in thread
From: Linus Torvalds @ 2007-03-07  1:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oliver Neukum, Maneesh Soni, Greg Kroah-Hartman, Adrian Bunk,
	linux-kernel



On Tue, 6 Mar 2007, Hugh Dickins wrote:
> 
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.

Gaah. What a crock.

I really don't see any alternative to just reverting the whole change. 
Hugh's patch is simple, but rather pointless.

The fact is, the whole change is *bogus*.

We don't "lock" datastructures. We *reference count* them!

This is so fundamental that it's even mentioned in the file 
Documentation/CodingStyle in "Chapter 11: Data structures".

The whole "orphaned" kind of locking is broken. It's stupid. The way we do 
races between removal and use is that initial setup sets a reference count 
of 1, and something really simple like:

	static inline struct sysfs_buffer *get_sysfs_buffer(struct inode *inode)
	{
		struct sysfs_buffer *buffer = inode->i_private;

		BUG_ON(!mutex_locked(&inode->i_mutex));
		if (buffer)
			atomic_inc(&buffer->count);
		return buffer;
	}

	static inline void put_sysfs_buffer(struct sysfs_buffer *buffer)
	{
		if (atomic_dec_and_test(&buffer->count))
			kfree(buffer);
	}

and then the rule is:

 - everybody uses "get_sysfs_buffer()" to follow the reference (and yes, 
   you obviously have to hold "inode->i_mutex" for this to be safe! I 
   added the BUG_ON() as an example)

 - everybody uses "put_buffer()" to release it (and we simply don't *care* 
   whether somebody else released it too, since everybody has a reference 
   count)

 - removing the buffer is now just

	mutex_lock(&inode->i_mutex);
	buffer = inode->i_private;
	inode->i_private = NULL;
	mutex_unlock(&inode->i_mutex);

	put_sysfs_buffer(buffer);

 - everybody is happy!

Anyway, I'm unable to revert the broken commit, since there are now other 
changes that depend on it, but can somebody *please* do that? I'll apply 
Hugh's silly patch in the meantime, just to avoid the lockup.

			Linus

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-06 19:20 Hugh Dickins
@ 2007-03-06 20:16 ` Oliver Neukum
  2007-03-07  1:56 ` Linus Torvalds
  1 sibling, 0 replies; 61+ messages in thread
From: Oliver Neukum @ 2007-03-06 20:16 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oliver Neukum, Maneesh Soni, Greg Kroah-Hartman, Adrian Bunk,
	Linus Torvalds, linux-kernel

Am Dienstag, 6. März 2007 20:20 schrieb Hugh Dickins:
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.

I had not thought about sysfs removing files in sysfs.

> Suspend no longer deadlocks with the following silly patch, but I expect
> this either pokes a small hole in your scheme, or renders it pointless.

The latter.
 
> Maybe that commit needs to be reverted, or maybe you can see how to fix
> it up for -rc3.

If you want a quick fix a work queue could be used, but it's a kludge.
Suggestions, anybody?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 61+ messages in thread

* 2.6.21-rc suspend regression: sysfs deadlock
@ 2007-03-06 19:20 Hugh Dickins
  2007-03-06 20:16 ` Oliver Neukum
  2007-03-07  1:56 ` Linus Torvalds
  0 siblings, 2 replies; 61+ messages in thread
From: Hugh Dickins @ 2007-03-06 19:20 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Maneesh Soni, Greg Kroah-Hartman, Adrian Bunk, Linus Torvalds,
	linux-kernel

Resume from RAM on a ThinkPad T43p is now happy with Thomas' periodic
tick fix - the most unusable aspect of that for me had been how slow
repeat keys were to start repeating, but that's all fine now.

But suspend to RAM still hanging, unless I "chmod a-x /usr/sbin/docker"
on SuSE 10.2: docker undock tries to unregister /sys/block/sr0 and hangs:

60x60         D B0415080     0 10778  10771                     (NOTLB)
       e8227e04 00000086 e80c60b0 b0415080 ef3f5454 b041dc20 ef3f5430 00000001 
       e80c60b0 72af360e 00000085 00001941 e80c61bc e8227e00 b01606bf ef47d3c0 
       ed07c1dc ed07c1e4 00000246 e8227e30 b02f6ef0 e80c60b0 00000001 e80c60b0 
Call Trace:
 [<b02f6ef0>] __down+0xaa/0xb8
 [<b02f6de6>] __down_failed+0xa/0x10
 [<b0180529>] sysfs_drop_dentry+0xa2/0xda
 [<b01819b3>] __sysfs_remove_dir+0x6d/0xf8
 [<b0181a53>] sysfs_remove_dir+0x15/0x20
 [<b01d49a9>] kobject_del+0x16/0x22
 [<b0230041>] device_del+0x1c9/0x1e2
 [<b025705a>] __scsi_remove_device+0x43/0x7a
 [<b02570b0>] scsi_remove_device+0x1f/0x2b
 [<b0256a44>] sdev_store_delete+0x16/0x1b
 [<b022f0a0>] dev_attr_store+0x32/0x34
 [<b0180931>] flush_write_buffer+0x37/0x3d
 [<b0180995>] sysfs_write_file+0x5e/0x82
 [<b01507f5>] vfs_write+0xa7/0x150
 [<b0150950>] sys_write+0x47/0x6b
 [<b0103d56>] sysenter_past_esp+0x5f/0x85
              /usr/lib/dockutils/hooks/thinkpad/60x60 undock
              /usr/lib/dockutils/dockhandler undock
              /usr/sbin/docker undock
              /etc/pm/hooks/23dock suspend

This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
while calling flush_write_buffer, and flushing that particular write
buffer entails downing buffer->sem in orphan_all_buffers.

Suspend no longer deadlocks with the following silly patch, but I expect
this either pokes a small hole in your scheme, or renders it pointless.
Maybe that commit needs to be reverted, or maybe you can see how to fix
it up for -rc3.

Thanks,
Hugh

--- 2.6.21-rc2-git5/fs/sysfs/inode.c	2007-02-28 08:30:26.000000000 +0000
+++ linux/fs/sysfs/inode.c	2007-03-06 18:03:13.000000000 +0000
@@ -227,11 +227,8 @@ static inline void orphan_all_buffers(st
 
 	mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
 	if (node->i_private) {
-		list_for_each_entry(buf, &set->associates, associates) {
-			down(&buf->sem);
+		list_for_each_entry(buf, &set->associates, associates)
 			buf->orphaned = 1;
-			up(&buf->sem);
-		}
 	}
 	mutex_unlock(&node->i_mutex);
 }

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2007-03-15 19:51 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-08 13:05 refcounting drivers' data structures used in sysfs buffers Oliver Neukum
2007-03-08 16:02 ` Alan Stern
2007-03-09  0:45   ` Oliver Neukum
2007-03-09 16:32     ` Alan Stern
2007-03-09 16:44       ` Oliver Neukum
2007-03-09 17:02         ` Dmitry Torokhov
2007-03-09 17:18           ` Oliver Neukum
2007-03-09 17:34             ` Dmitry Torokhov
2007-03-09 19:32               ` Alan Stern
2007-03-09 20:05                 ` Oliver Neukum
2007-03-09 20:27                   ` Alan Stern
2007-03-09 20:39                     ` Oliver Neukum
2007-03-09 20:08               ` Alan Stern
2007-03-09 20:48                 ` Oliver Neukum
2007-03-10 19:19                   ` Alan Stern
2007-03-12  8:54                     ` Oliver Neukum
2007-03-12 14:57                       ` Alan Stern
2007-03-12 15:23                         ` Oliver Neukum
2007-03-12 15:42                           ` Dmitry Torokhov
2007-03-12 15:59                             ` Oliver Neukum
2007-03-12 16:21                               ` Alan Stern
2007-03-12 18:25                                 ` Oliver Neukum
2007-03-12 19:31                                   ` Alan Stern
2007-03-12 19:49                                     ` Oliver Neukum
2007-03-12 20:03                                       ` Alan Stern
2007-03-12 20:15                                         ` Oliver Neukum
2007-03-12 20:31                                         ` Dmitry Torokhov
2007-03-12 20:45                                           ` Alan Stern
2007-03-12 21:31                                           ` Richard Purdie
2007-03-13 15:00                                             ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
2007-03-13 18:42                                               ` Cornelia Huck
2007-03-13 21:20                                                 ` Linus Torvalds
2007-03-14 16:12                                                   ` Alan Stern
2007-03-14 18:43                                                     ` Cornelia Huck
2007-03-14 19:23                                                       ` Alan Stern
2007-03-15 10:27                                                         ` Cornelia Huck
2007-03-15 12:31                                                           ` Hugh Dickins
2007-03-15 13:02                                                             ` Oliver Neukum
2007-03-15 13:22                                                               ` Dmitry Torokhov
2007-03-15 13:59                                                                 ` Hugh Dickins
2007-03-15 14:27                                                           ` Alan Stern
2007-03-15 15:32                                                             ` Cornelia Huck
2007-03-15 16:29                                                             ` Hugh Dickins
2007-03-15 16:51                                                               ` Linus Torvalds
2007-03-15 19:50                                                                 ` [PATCH] sysfs and driver core: add callback helper, used by SCSI and S390 Alan Stern
2007-03-15 19:51                                                                 ` [PATCH] sysfs: reinstate exclusion between method calls and attribute unregistration Alan Stern
2007-03-13 19:00                                               ` 2.6.21-rc suspend regression: sysfs deadlock Hugh Dickins
2007-03-13 20:09                                                 ` Alan Stern
2007-03-13 20:55                                                   ` Hugh Dickins
2007-03-13 21:08                                                     ` Dmitry Torokhov
2007-03-13 21:20                                                     ` Alan Stern
  -- strict thread matches above, loose matches on Subject: below --
2007-03-10 20:44 Alan Stern
2007-03-06 19:20 Hugh Dickins
2007-03-06 20:16 ` Oliver Neukum
2007-03-07  1:56 ` Linus Torvalds
2007-03-07 14:38   ` Oliver Neukum
2007-03-07 15:56   ` Dmitry Torokhov
2007-03-07 16:52     ` Linus Torvalds
2007-03-07 16:59       ` Oliver Neukum
2007-03-07 18:02         ` Linus Torvalds
2007-03-07 18:16           ` Oliver Neukum

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).