linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* refcounting drivers' data structures used in sysfs buffers
@ 2007-03-08 13:05 Oliver Neukum
  2007-03-08 16:02 ` Alan Stern
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-08 13:05 UTC (permalink / raw)
  To: Dmitry Torokhov, Maneesh Soni, gregkh, Alan Stern, linux-kernel

Hi,

after a lightning bolt from high above I've been looking into refcounting
the data structures drivers use to provide the data used to refill sysfs
buffers. I've come to the following conclusion.

1. struct sysfs_buffer must have a struct kref * and probably a destructor
pointer
2. drivers must be able to pass these pointers through an extended
device_create_file()
3. Drivers must use refcounting if they want to use attributes
4. read/write/poll must do refcounting

I am not sure where to store the pointers. struct sysfs_dirent() looks
like the obvious choice. Comments?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-08 13:05 refcounting drivers' data structures used in sysfs buffers Oliver Neukum
@ 2007-03-08 16:02 ` Alan Stern
  2007-03-09  0:45   ` Oliver Neukum
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-08 16:02 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Thu, 8 Mar 2007, Oliver Neukum wrote:

> Hi,
> 
> after a lightning bolt from high above I've been looking into refcounting
> the data structures drivers use to provide the data used to refill sysfs
> buffers. I've come to the following conclusion.
> 
> 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> pointer
> 2. drivers must be able to pass these pointers through an extended
> device_create_file()
> 3. Drivers must use refcounting if they want to use attributes
> 4. read/write/poll must do refcounting
> 
> I am not sure where to store the pointers. struct sysfs_dirent() looks
> like the obvious choice. Comments?

Can you explain the reasoning that led to these conclusions?  And what 
exactly was your lightning bolt?

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-08 16:02 ` Alan Stern
@ 2007-03-09  0:45   ` Oliver Neukum
  2007-03-09 16:32     ` Alan Stern
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-09  0:45 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> On Thu, 8 Mar 2007, Oliver Neukum wrote:
> 
> > Hi,
> > 
> > after a lightning bolt from high above I've been looking into refcounting
> > the data structures drivers use to provide the data used to refill sysfs
> > buffers. I've come to the following conclusion.
> > 
> > 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> > pointer
> > 2. drivers must be able to pass these pointers through an extended
> > device_create_file()
> > 3. Drivers must use refcounting if they want to use attributes
> > 4. read/write/poll must do refcounting
> > 
> > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > like the obvious choice. Comments?
> 
> Can you explain the reasoning that led to these conclusions?  And what 
> exactly was your lightning bolt?

The old race between disconnect and IO to attribute via sysfs again.
If I cannot disassociate the drivers from the buffers in the buffers, drivers
must not deallocate the data necessary to answer sysfs callbacks while
a buffer exists. Reading from a buffer must up a refcount in the driver's
data structures. The question becomes how to get a pointer to the buffer.
And it cannot live in the dentry as the dentry can go away while files
are still open. This leaves the inode or the buffer.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09  0:45   ` Oliver Neukum
@ 2007-03-09 16:32     ` Alan Stern
  2007-03-09 16:44       ` Oliver Neukum
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-09 16:32 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Fri, 9 Mar 2007, Oliver Neukum wrote:

> Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> > On Thu, 8 Mar 2007, Oliver Neukum wrote:
> > 
> > > Hi,
> > > 
> > > after a lightning bolt from high above I've been looking into refcounting
> > > the data structures drivers use to provide the data used to refill sysfs
> > > buffers. I've come to the following conclusion.
> > > 
> > > 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> > > pointer
> > > 2. drivers must be able to pass these pointers through an extended
> > > device_create_file()
> > > 3. Drivers must use refcounting if they want to use attributes
> > > 4. read/write/poll must do refcounting
> > > 
> > > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > > like the obvious choice. Comments?
> > 
> > Can you explain the reasoning that led to these conclusions?  And what 
> > exactly was your lightning bolt?
> 
> The old race between disconnect and IO to attribute via sysfs again.
> If I cannot disassociate the drivers from the buffers in the buffers, drivers
> must not deallocate the data necessary to answer sysfs callbacks while
> a buffer exists.

Why wouldn't you be able to dissociate a driver from a buffer?  That was 
the whole point of adding .orphan to sysfs_buffer and creating 
sysfs_buffer_collection -- it was supposed to solve exactly this race.

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 16:32     ` Alan Stern
@ 2007-03-09 16:44       ` Oliver Neukum
  2007-03-09 17:02         ` Dmitry Torokhov
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-09 16:44 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Freitag, 9. März 2007 17:32 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Oliver Neukum wrote:
> 
> > Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> > > On Thu, 8 Mar 2007, Oliver Neukum wrote:
> > > 
> > > > Hi,
> > > > 
> > > > after a lightning bolt from high above I've been looking into refcounting
> > > > the data structures drivers use to provide the data used to refill sysfs
> > > > buffers. I've come to the following conclusion.
> > > > 
> > > > 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> > > > pointer
> > > > 2. drivers must be able to pass these pointers through an extended
> > > > device_create_file()
> > > > 3. Drivers must use refcounting if they want to use attributes
> > > > 4. read/write/poll must do refcounting
> > > > 
> > > > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > > > like the obvious choice. Comments?
> > > 
> > > Can you explain the reasoning that led to these conclusions?  And what 
> > > exactly was your lightning bolt?
> > 
> > The old race between disconnect and IO to attribute via sysfs again.
> > If I cannot disassociate the drivers from the buffers in the buffers, drivers
> > must not deallocate the data necessary to answer sysfs callbacks while
> > a buffer exists.
> 
> Why wouldn't you be able to dissociate a driver from a buffer?  That was 
> the whole point of adding .orphan to sysfs_buffer and creating 
> sysfs_buffer_collection -- it was supposed to solve exactly this race.

It did solve the race but deadlocked when unbinding devices through sysfs.
Linux therefore asked for the patch to be reverted and wants the isue solved
with refcounting.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 16:44       ` Oliver Neukum
@ 2007-03-09 17:02         ` Dmitry Torokhov
  2007-03-09 17:18           ` Oliver Neukum
  0 siblings, 1 reply; 51+ messages in thread
From: Dmitry Torokhov @ 2007-03-09 17:02 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Alan Stern, Maneesh Soni, gregkh, linux-kernel

On 3/9/07, Oliver Neukum <oneukum@suse.de> wrote:
> Am Freitag, 9. März 2007 17:32 schrieb Alan Stern:
> > On Fri, 9 Mar 2007, Oliver Neukum wrote:
> >
> > > Am Donnerstag, 8. März 2007 17:02 schrieb Alan Stern:
> > > > On Thu, 8 Mar 2007, Oliver Neukum wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > after a lightning bolt from high above I've been looking into refcounting
> > > > > the data structures drivers use to provide the data used to refill sysfs
> > > > > buffers. I've come to the following conclusion.
> > > > >
> > > > > 1. struct sysfs_buffer must have a struct kref * and probably a destructor
> > > > > pointer
> > > > > 2. drivers must be able to pass these pointers through an extended
> > > > > device_create_file()
> > > > > 3. Drivers must use refcounting if they want to use attributes
> > > > > 4. read/write/poll must do refcounting
> > > > >
> > > > > I am not sure where to store the pointers. struct sysfs_dirent() looks
> > > > > like the obvious choice. Comments?
> > > >
> > > > Can you explain the reasoning that led to these conclusions?  And what
> > > > exactly was your lightning bolt?
> > >
> > > The old race between disconnect and IO to attribute via sysfs again.
> > > If I cannot disassociate the drivers from the buffers in the buffers, drivers
> > > must not deallocate the data necessary to answer sysfs callbacks while
> > > a buffer exists.
> >
> > Why wouldn't you be able to dissociate a driver from a buffer?  That was
> > the whole point of adding .orphan to sysfs_buffer and creating
> > sysfs_buffer_collection -- it was supposed to solve exactly this race.
>
> It did solve the race but deadlocked when unbinding devices through sysfs.
> Linux therefore asked for the patch to be reverted and wants the isue solved
> with refcounting.
>

I think we already have all refcounting that is needed. What is
missing is subsystem-provided ->release() hooks for drivers to release
driver-specific resources when a device finally goes away.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 17:02         ` Dmitry Torokhov
@ 2007-03-09 17:18           ` Oliver Neukum
  2007-03-09 17:34             ` Dmitry Torokhov
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-09 17:18 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Alan Stern, Maneesh Soni, gregkh, linux-kernel

Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:

> I think we already have all refcounting that is needed. What is
> missing is subsystem-provided ->release() hooks for drivers to release
> driver-specific resources when a device finally goes away.

This is an interesting idea. Is it nice to pass through release()
but not open() ?

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 17:18           ` Oliver Neukum
@ 2007-03-09 17:34             ` Dmitry Torokhov
  2007-03-09 19:32               ` Alan Stern
  2007-03-09 20:08               ` Alan Stern
  0 siblings, 2 replies; 51+ messages in thread
From: Dmitry Torokhov @ 2007-03-09 17:34 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Alan Stern, Maneesh Soni, gregkh, linux-kernel

On 3/9/07, Oliver Neukum <oneukum@suse.de> wrote:
> Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:
>
> > I think we already have all refcounting that is needed. What is
> > missing is subsystem-provided ->release() hooks for drivers to release
> > driver-specific resources when a device finally goes away.
>
> This is an interesting idea. Is it nice to pass through release()
> but not open() ?
>

Not sure if I follow... Generally speaking open is not a mandatory
operation; however every object in driver model has a release method.
What I am saying is that certain drivers need to have their disconnect
method split in 2 parts - one that shuts down the device and second is
releases resources that might be accesses through sysfs (and other
kernel parts). That second part will have to be called from
subsystem's core ->release() method se we need a release() hook.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 17:34             ` Dmitry Torokhov
@ 2007-03-09 19:32               ` Alan Stern
  2007-03-09 20:05                 ` Oliver Neukum
  2007-03-09 20:08               ` Alan Stern
  1 sibling, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-09 19:32 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Oliver Neukum, Maneesh Soni, gregkh, linux-kernel

On Fri, 9 Mar 2007, Dmitry Torokhov wrote:

> On 3/9/07, Oliver Neukum <oneukum@suse.de> wrote:
> > Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:
> >
> > > I think we already have all refcounting that is needed. What is
> > > missing is subsystem-provided ->release() hooks for drivers to release
> > > driver-specific resources when a device finally goes away.
> >
> > This is an interesting idea. Is it nice to pass through release()
> > but not open() ?
> >
> 
> Not sure if I follow... Generally speaking open is not a mandatory
> operation; however every object in driver model has a release method.
> What I am saying is that certain drivers need to have their disconnect
> method split in 2 parts - one that shuts down the device and second is
> releases resources that might be accesses through sysfs (and other
> kernel parts). That second part will have to be called from
> subsystem's core ->release() method se we need a release() hook.

Dmitry, you're not viewing this correctly.

Adding a new release() callback would solve the problem by creating 
another.  Drivers need to release their data as soon as possible after
they unbind from a device, not when the device itself goes away.  Think
about what would happen if you tried to rmmod a driver.  The rmmod process 
would block until the device was unregistered.

Oliver, your idea won't work either.  Think about what would happen if 
someone did

	rmmod driver_module </sys/devices/.../attribute_file

The rmmod process would never actually read the attribute, so until it 
exited the private data structure would have a positive refcount.  But 
rmmod can't exit until the driver has been unloaded from memory, and it 
can't be unloaded while its data structure is still allocated.  Thus we 
would end up with deadlock; rmmod would hang forever.

It might be better to keep your earlier patch and fix the deadlock you
mentioned earlier, the one that occurs when unbinding a driver through
sysfs.  How exactly does that deadlock work?

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 19:32               ` Alan Stern
@ 2007-03-09 20:05                 ` Oliver Neukum
  2007-03-09 20:27                   ` Alan Stern
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-09 20:05 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Freitag, 9. März 2007 20:32 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Dmitry Torokhov wrote:
> 
> > On 3/9/07, Oliver Neukum <oneukum@suse.de> wrote:
> > > Am Freitag, 9. März 2007 18:02 schrieb Dmitry Torokhov:
> > >
> > > > I think we already have all refcounting that is needed. What is
> > > > missing is subsystem-provided ->release() hooks for drivers to release
> > > > driver-specific resources when a device finally goes away.
> > >
> > > This is an interesting idea. Is it nice to pass through release()
> > > but not open() ?
> > >
> > 
> > Not sure if I follow... Generally speaking open is not a mandatory
> > operation; however every object in driver model has a release method.
> > What I am saying is that certain drivers need to have their disconnect
> > method split in 2 parts - one that shuts down the device and second is
> > releases resources that might be accesses through sysfs (and other
> > kernel parts). That second part will have to be called from
> > subsystem's core ->release() method se we need a release() hook.
> 
> Dmitry, you're not viewing this correctly.
> 
> Adding a new release() callback would solve the problem by creating 
> another.  Drivers need to release their data as soon as possible after
> they unbind from a device, not when the device itself goes away.  Think

Wait, the callback from closing the file in sysfs is the earliest we can safely
free the data structure. How do you want to free earlier?

> about what would happen if you tried to rmmod a driver.  The rmmod process 
> would block until the device was unregistered.
> 
> Oliver, your idea won't work either.  Think about what would happen if 
> someone did
> 
> 	rmmod driver_module </sys/devices/.../attribute_file

-EBUSY, as currently.
 
> The rmmod process would never actually read the attribute, so until it 
> exited the private data structure would have a positive refcount.  But 
> rmmod can't exit until the driver has been unloaded from memory, and it 
> can't be unloaded while its data structure is still allocated.  Thus we 
> would end up with deadlock; rmmod would hang forever.
> 
> It might be better to keep your earlier patch and fix the deadlock you
> mentioned earlier, the one that occurs when unbinding a driver through
> sysfs.  How exactly does that deadlock work?

http://lkml.org/lkml/2007/3/6/364
http://lkml.org/lkml/2007/3/6/528

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 17:34             ` Dmitry Torokhov
  2007-03-09 19:32               ` Alan Stern
@ 2007-03-09 20:08               ` Alan Stern
  2007-03-09 20:48                 ` Oliver Neukum
  1 sibling, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-09 20:08 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Oliver Neukum, Maneesh Soni, gregkh, linux-kernel

On Fri, 9 Mar 2007, Alan Stern wrote:

> Oliver, your idea won't work either.  Think about what would happen if 
> someone did
> 
> 	rmmod driver_module </sys/devices/.../attribute_file
> 
> The rmmod process would never actually read the attribute, so until it 
> exited the private data structure would have a positive refcount.  But 
> rmmod can't exit until the driver has been unloaded from memory, and it 
> can't be unloaded while its data structure is still allocated.  Thus we 
> would end up with deadlock; rmmod would hang forever.

I take this back.  Redirecting stdin to the attribute file would increase 
the module's refcount and cause rmmod to exit immediately with an error.

After some more thought, I basically agree with what Oliver wrote
originally.  sysfs_dirent is indeed the logical place to store the kref
pointer.  However it needs to be used during open and release, not during
read, write, and poll.  Another point, which Oliver didn't think of, is
that the kref pointer needs to be passed to the driver as an argument in
the show() and store() method calls.

Implementing this will be difficult.  One possibility is to change the 
definition of sysfs_ops, adding the new struct kref * argument to the 
prototypes.  This will involve changing _lots_ of source files, adding an 
unused argument to many functions, which isn't attractive.

The other possibility is to test at runtime whether the kref pointer is 
NULL, and if it is, don't pass it.  This would work, but it isn't 
type-safe.

Finally, there's added complexity in each driver which wants to use the 
new facility.  The module_exit routine will need to be smart enough to 
block until all the private data structures have been released.  
usb-storage does something like that now; it's kind of ugly (although it 
could be improved if appropriate support were added to the core kernel).

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 20:05                 ` Oliver Neukum
@ 2007-03-09 20:27                   ` Alan Stern
  2007-03-09 20:39                     ` Oliver Neukum
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-09 20:27 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Fri, 9 Mar 2007, Oliver Neukum wrote:

> > Adding a new release() callback would solve the problem by creating 
> > another.  Drivers need to release their data as soon as possible after
> > they unbind from a device, not when the device itself goes away.  Think
> 
> Wait, the callback from closing the file in sysfs is the earliest we can safely
> free the data structure. How do you want to free earlier?

It is _not_ the earliest we can safely free the data structure.

Dmitry's callback occurs when _all_ the sysfs attributes have been
released -- including ones that don't have anything to do with the
driver's private data structure.  Think of the bInterfaceClass attribute,
for example.

But even aside from that, Dmitry's suggestion is wrong.  He wants to add a
second release() method to the driver, which can be called from the
subsystem's release() method -- which doesn't run until the device is
unregistered, possibly long after the driver has been unbound from it.  
Then how could the subsystem even know which drivers need their second
release method called?


> > Oliver, your idea won't work either.  Think about what would happen if 
> > someone did
> > 
> > 	rmmod driver_module </sys/devices/.../attribute_file
> 
> -EBUSY, as currently.

Yeah, I was wrong about that.


> > It might be better to keep your earlier patch and fix the deadlock you
> > mentioned earlier, the one that occurs when unbinding a driver through
> > sysfs.  How exactly does that deadlock work?
> 
> http://lkml.org/lkml/2007/3/6/364
> http://lkml.org/lkml/2007/3/6/528

I get the picture, thanks.

Alan Stern



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 20:27                   ` Alan Stern
@ 2007-03-09 20:39                     ` Oliver Neukum
  0 siblings, 0 replies; 51+ messages in thread
From: Oliver Neukum @ 2007-03-09 20:39 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Freitag, 9. März 2007 21:27 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Oliver Neukum wrote:
> 
> > > Adding a new release() callback would solve the problem by creating 
> > > another.  Drivers need to release their data as soon as possible after
> > > they unbind from a device, not when the device itself goes away.  Think
> > 
> > Wait, the callback from closing the file in sysfs is the earliest we can safely
> > free the data structure. How do you want to free earlier?
> 
> It is _not_ the earliest we can safely free the data structure.
> 
> Dmitry's callback occurs when _all_ the sysfs attributes have been
> released -- including ones that don't have anything to do with the
> driver's private data structure.  Think of the bInterfaceClass attribute,
> for example.

Ok, yes I see. It is by far too late.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 20:08               ` Alan Stern
@ 2007-03-09 20:48                 ` Oliver Neukum
  2007-03-10 19:19                   ` Alan Stern
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-09 20:48 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Freitag, 9. März 2007 21:08 schrieb Alan Stern:
> After some more thought, I basically agree with what Oliver wrote
> originally.  sysfs_dirent is indeed the logical place to store the kref
> pointer.  However it needs to be used during open and release, not during

OK.

> read, write, and poll.  Another point, which Oliver didn't think of, is
> that the kref pointer needs to be passed to the driver as an argument in
> the show() and store() method calls.

Why? What's wrong with simply calling kref_get/put?
 
> Finally, there's added complexity in each driver which wants to use the 
> new facility.  The module_exit routine will need to be smart enough to 
> block until all the private data structures have been released.  
> usb-storage does something like that now; it's kind of ugly (although it 
> could be improved if appropriate support were added to the core kernel).

If we up the module count for every bound device, all device attributes
should be gone before we ever get that far.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-09 20:48                 ` Oliver Neukum
@ 2007-03-10 19:19                   ` Alan Stern
  2007-03-12  8:54                     ` Oliver Neukum
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-10 19:19 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Fri, 9 Mar 2007, Oliver Neukum wrote:

> Am Freitag, 9. März 2007 21:08 schrieb Alan Stern:
> > After some more thought, I basically agree with what Oliver wrote
> > originally.  sysfs_dirent is indeed the logical place to store the kref
> > pointer.  However it needs to be used during open and release, not during
> 
> OK.
> 
> > read, write, and poll.  Another point, which Oliver didn't think of, is
> > that the kref pointer needs to be passed to the driver as an argument in
> > the show() and store() method calls.
> 
> Why? What's wrong with simply calling kref_get/put?

It's the same old problem: the race between unbind and sysfs I/O.  What
good does holding a reference to the private data structure do if the
show/store method gets called after the driver has been unbound from the
device?  dev_get_drvdata() will no longer provide a valid pointer to the
private data, so the method will have no way to access it.  Hence the
method needs another argument.

(BTW, the sysfs core would actually need more than a kref.  It would also 
need a pointer to a release routine -- the kref contains only the atomic 
counter.  The more you think about it, the more complicated this approach 
becomes.)

> > Finally, there's added complexity in each driver which wants to use the 
> > new facility.  The module_exit routine will need to be smart enough to 
> > block until all the private data structures have been released.  
> > usb-storage does something like that now; it's kind of ugly (although it 
> > could be improved if appropriate support were added to the core kernel).
> 
> If we up the module count for every bound device, all device attributes
> should be gone before we ever get that far.

Not quite right.  However, since every open sysfs file holds a module
reference, if the driver's module_exit has been called then there can be
no open sysfs files, hence no private data still pinned.  Thus this isn't
a problem at all.


But never mind all the above.  I'm going to post another message on this
thread in which I argue that Oliver's original approach was a good one and
should not have been reverted.  The specific problem identified by Hugh
Dickins can be fixed in the way Dmitry first suggested, by doing the real
operation from a workqueue routine.

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-10 19:19                   ` Alan Stern
@ 2007-03-12  8:54                     ` Oliver Neukum
  2007-03-12 14:57                       ` Alan Stern
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-12  8:54 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Samstag, 10. März 2007 20:19 schrieb Alan Stern:
> On Fri, 9 Mar 2007, Oliver Neukum wrote:
> 
> > Am Freitag, 9. März 2007 21:08 schrieb Alan Stern:
> > > After some more thought, I basically agree with what Oliver wrote
> > > originally.  sysfs_dirent is indeed the logical place to store the kref
> > > pointer.  However it needs to be used during open and release, not during
> > 
> > OK.
> > 
> > > read, write, and poll.  Another point, which Oliver didn't think of, is
> > > that the kref pointer needs to be passed to the driver as an argument in
> > > the show() and store() method calls.
> > 
> > Why? What's wrong with simply calling kref_get/put?
> 
> It's the same old problem: the race between unbind and sysfs I/O.  What
> good does holding a reference to the private data structure do if the
> show/store method gets called after the driver has been unbound from the
> device?  dev_get_drvdata() will no longer provide a valid pointer to the
> private data, so the method will have no way to access it.  Hence the
> method needs another argument.

It does half the job. You can make sure the driver is not asked to access
freed memory.
It is true that a driver will have to mark that device "disconnected"
and return errors if that device's attributes are referenced, but this can
be done internally.

Yes, this is a bit more complicated.
{rant mode}
Who came up with the idea of making life simpler by adding a code path?
All these problems were already solved for device nodes. Ioctl is ugly, but
at least a known code path.
{rant off}

> (BTW, the sysfs core would actually need more than a kref.  It would also 
> need a pointer to a release routine -- the kref contains only the atomic

Yes, this is implied.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12  8:54                     ` Oliver Neukum
@ 2007-03-12 14:57                       ` Alan Stern
  2007-03-12 15:23                         ` Oliver Neukum
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-12 14:57 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Mon, 12 Mar 2007, Oliver Neukum wrote:

> > > Why? What's wrong with simply calling kref_get/put?
> > 
> > It's the same old problem: the race between unbind and sysfs I/O.  What
> > good does holding a reference to the private data structure do if the
> > show/store method gets called after the driver has been unbound from the
> > device?  dev_get_drvdata() will no longer provide a valid pointer to the
> > private data, so the method will have no way to access it.  Hence the
> > method needs another argument.
> 
> It does half the job. You can make sure the driver is not asked to access
> freed memory.
> It is true that a driver will have to mark that device "disconnected"
> and return errors if that device's attributes are referenced, but this can
> be done internally.

No, you're missing the point.  Let's say driver A's disconnect() is
called, so the driver marks its private data structure as "disconnected"
and does dev_set_drvdata(NULL).  Then driver B is probed and bound to the
device, and it does its own dev_set_drvdata().  Then a user still holding
an open sysfs file reference for driver A calls a show() or store()  
method.  The method will do dev_get_drvdata(), receiving the pointer to
driver B's private data.  Now you're in trouble, because A's method will
think it owns B's private data!

> Yes, this is a bit more complicated.
> {rant mode}
> Who came up with the idea of making life simpler by adding a code path?
> All these problems were already solved for device nodes. Ioctl is ugly, but
> at least a known code path.
> {rant off}

I'll let Greg give the complete answer.  :-)  Bear in mind, however, that
the aim was probably to make life simpler for userspace -- which does not
mean making life simpler for the kernel.

(Incidentally, I'm not so sure that all these problems really were solved 
by ioctl on device nodes.  I bet you could find plenty of cases where 
ioctl races with disconnect if you looked.)

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 14:57                       ` Alan Stern
@ 2007-03-12 15:23                         ` Oliver Neukum
  2007-03-12 15:42                           ` Dmitry Torokhov
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-12 15:23 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Montag, 12. März 2007 15:57 schrieb Alan Stern:probably nece
> On Mon, 12 Mar 2007, Oliver Neukum wrote:
> 
> > > > Why? What's wrong with simply calling kref_get/put?
> > > 
> > > It's the same old problem: the race between unbind and sysfs I/O.  What
> > > good does holding a reference to the private data structure do if the
> > > show/store method gets called after the driver has been unbound from the
> > > device?  dev_get_drvdata() will no longer provide a valid pointer to the
> > > private data, so the method will have no way to access it.  Hence the
> > > method needs another argument.
> > 
> > It does half the job. You can make sure the driver is not asked to access
> > freed memory.
> > It is true that a driver will have to mark that device "disconnected"
> > and return errors if that device's attributes are referenced, but this can
> > be done internally.
> 
> No, you're missing the point.  Let's say driver A's disconnect() is
> called, so the driver marks its private data structure as "disconnected"
> and does dev_set_drvdata(NULL).  Then driver B is probed and bound to the
> device, and it does its own dev_set_drvdata().  Then a user still holding
> an open sysfs file reference for driver A calls a show() or store()  
> method.  The method will do dev_get_drvdata(), receiving the pointer to
> driver B's private data.  Now you're in trouble, because A's method will
> think it owns B's private data!

Yes, I was missing the point. In consequence, drivers must not use
dev_get_drvdata() to get their references to their private data. It's
probably necessary to store it in struct sysfs_buffer and include that
in the store/show callbacks.
(The same does apply to interfaces of course)

> > Yes, this is a bit more complicated.
> > {rant mode}
> > Who came up with the idea of making life simpler by adding a code path?
> > All these problems were already solved for device nodes. Ioctl is ugly, but
> > at least a known code path.
> > {rant off}
> 
> I'll let Greg give the complete answer.  :-)  Bear in mind, however, that
> the aim was probably to make life simpler for userspace -- which does not
> mean making life simpler for the kernel.

That doesn't mean that the method needed to be thrown out.
Sysfs could simply pass through the syscalls for a device, like
it is done in character devices. I am tempted to recommend
such radical surgery.

> (Incidentally, I'm not so sure that all these problems really were solved 
> by ioctl on device nodes.  I bet you could find plenty of cases where 
> ioctl races with disconnect if you looked.)

I will look. Death to all race conditions.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 15:23                         ` Oliver Neukum
@ 2007-03-12 15:42                           ` Dmitry Torokhov
  2007-03-12 15:59                             ` Oliver Neukum
  0 siblings, 1 reply; 51+ messages in thread
From: Dmitry Torokhov @ 2007-03-12 15:42 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Alan Stern, Maneesh Soni, gregkh, linux-kernel

On 3/12/07, Oliver Neukum <oneukum@suse.de> wrote:
> Am Montag, 12. März 2007 15:57 schrieb Alan Stern:probably nece
> > On Mon, 12 Mar 2007, Oliver Neukum wrote:
> >
> > > > > Why? What's wrong with simply calling kref_get/put?
> > > >
> > > > It's the same old problem: the race between unbind and sysfs I/O.  What
> > > > good does holding a reference to the private data structure do if the
> > > > show/store method gets called after the driver has been unbound from the
> > > > device?  dev_get_drvdata() will no longer provide a valid pointer to the
> > > > private data, so the method will have no way to access it.  Hence the
> > > > method needs another argument.
> > >
> > > It does half the job. You can make sure the driver is not asked to access
> > > freed memory.
> > > It is true that a driver will have to mark that device "disconnected"
> > > and return errors if that device's attributes are referenced, but this can
> > > be done internally.
> >
> > No, you're missing the point.  Let's say driver A's disconnect() is
> > called, so the driver marks its private data structure as "disconnected"
> > and does dev_set_drvdata(NULL).  Then driver B is probed and bound to the
> > device, and it does its own dev_set_drvdata().  Then a user still holding
> > an open sysfs file reference for driver A calls a show() or store()
> > method.  The method will do dev_get_drvdata(), receiving the pointer to
> > driver B's private data.  Now you're in trouble, because A's method will
> > think it owns B's private data!
>
> Yes, I was missing the point. In consequence, drivers must not use
> dev_get_drvdata() to get their references to their private data. It's
> probably necessary to store it in struct sysfs_buffer and include that
> in the store/show callbacks.
> (The same does apply to interfaces of course)
>

Or drivers coudl verify that they still bound to the device they are
about to operate on (psmouse does this by taking a lock on device and
then checking if driver bound is the same address as psmouse). But I'd
rather get rid of all this clutter if we could sever sysfs access
after removing corresponding attributes.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 15:42                           ` Dmitry Torokhov
@ 2007-03-12 15:59                             ` Oliver Neukum
  2007-03-12 16:21                               ` Alan Stern
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-12 15:59 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Alan Stern, Maneesh Soni, gregkh, linux-kernel

Am Montag, 12. März 2007 16:42 schrieb Dmitry Torokhov:
> On 3/12/07, Oliver Neukum <oneukum@suse.de> wrote:
> > Am Montag, 12. März 2007 15:57 schrieb Alan Stern
> > > No, you're missing the point.  Let's say driver A's disconnect() is
> > > called, so the driver marks its private data structure as "disconnected"
> > > and does dev_set_drvdata(NULL).  Then driver B is probed and bound to the
> > > device, and it does its own dev_set_drvdata().  Then a user still holding
> > > an open sysfs file reference for driver A calls a show() or store()
> > > method.  The method will do dev_get_drvdata(), receiving the pointer to
> > > driver B's private data.  Now you're in trouble, because A's method will
> > > think it owns B's private data!
> >
> > Yes, I was missing the point. In consequence, drivers must not use
> > dev_get_drvdata() to get their references to their private data. It's
> > probably necessary to store it in struct sysfs_buffer and include that
> > in the store/show callbacks.
> > (The same does apply to interfaces of course)
> >
> 
> Or drivers coudl verify that they still bound to the device they are
> about to operate on (psmouse does this by taking a lock on device and
> then checking if driver bound is the same address as psmouse). But I'd
> rather get rid of all this clutter if we could sever sysfs access
> after removing corresponding attributes.

No, the call has to fail if the driver is rebound to the device.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 15:59                             ` Oliver Neukum
@ 2007-03-12 16:21                               ` Alan Stern
  2007-03-12 18:25                                 ` Oliver Neukum
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-12 16:21 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Mon, 12 Mar 2007, Oliver Neukum wrote:

> > > Yes, I was missing the point. In consequence, drivers must not use
> > > dev_get_drvdata() to get their references to their private data. It's

You do realize how foolish that sounds?  Why do you think 
dev_get_drvdata() was written in the first place?

> > > probably necessary to store it in struct sysfs_buffer and include that
> > > in the store/show callbacks.
> > > (The same does apply to interfaces of course)
> > >
> > 
> > Or drivers coudl verify that they still bound to the device they are
> > about to operate on (psmouse does this by taking a lock on device and
> > then checking if driver bound is the same address as psmouse). But I'd
> > rather get rid of all this clutter if we could sever sysfs access
> > after removing corresponding attributes.
> 
> No, the call has to fail if the driver is rebound to the device.

I'm with Dmitry; the whole thing becomes much, much simpler if we put back
your patch and prevent sysfs access after unregistering an attribute 
file.  No API changes are needed, no driver changes are needed, no radical 
core changes are needed,...  All we would have to do is fix the one SCSI 
method to make it use a workqueue.

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 16:21                               ` Alan Stern
@ 2007-03-12 18:25                                 ` Oliver Neukum
  2007-03-12 19:31                                   ` Alan Stern
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-12 18:25 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Montag, 12. März 2007 17:21 schrieb Alan Stern:
> On Mon, 12 Mar 2007, Oliver Neukum wrote:
> 
> > > > Yes, I was missing the point. In consequence, drivers must not use
> > > > dev_get_drvdata() to get their references to their private data. It's
> 
> You do realize how foolish that sounds?  Why do you think 
> dev_get_drvdata() was written in the first place?

It's still useful in disconnect/suspend/resume/etc...
If everything were alright with the design, we wouldn't be discussing
it now, would we?

 
> I'm with Dmitry; the whole thing becomes much, much simpler if we put back
> your patch and prevent sysfs access after unregistering an attribute 
> file.  No API changes are needed, no driver changes are needed, no radical 
> core changes are needed,...  All we would have to do is fix the one SCSI 
> method to make it use a workqueue.

Try. I don't like reverting my own code. But I predict he'll tell you that a
driver's bond with a device should be represented in a data structure
that is to be refcounted.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 18:25                                 ` Oliver Neukum
@ 2007-03-12 19:31                                   ` Alan Stern
  2007-03-12 19:49                                     ` Oliver Neukum
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-12 19:31 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Mon, 12 Mar 2007, Oliver Neukum wrote:

> > I'm with Dmitry; the whole thing becomes much, much simpler if we put back
> > your patch and prevent sysfs access after unregistering an attribute 
> > file.  No API changes are needed, no driver changes are needed, no radical 
> > core changes are needed,...  All we would have to do is fix the one SCSI 
> > method to make it use a workqueue.
> 
> Try.

I did.  Didn't you see this message from Saturday:

http://marc.theaimsgroup.com/?l=linux-kernel&m=117355959020831&w=2

I sent it to Linus as well as to all of you.  No replies received so far 
from anybody.

>  I don't like reverting my own code. But I predict he'll tell you that a
> driver's bond with a device should be represented in a data structure
> that is to be refcounted.

:-)

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 19:31                                   ` Alan Stern
@ 2007-03-12 19:49                                     ` Oliver Neukum
  2007-03-12 20:03                                       ` Alan Stern
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-12 19:49 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Montag, 12. März 2007 20:31 schrieb Alan Stern:
> On Mon, 12 Mar 2007, Oliver Neukum wrote:
> 
> > > I'm with Dmitry; the whole thing becomes much, much simpler if we put back
> > > your patch and prevent sysfs access after unregistering an attribute 
> > > file.  No API changes are needed, no driver changes are needed, no radical 
> > > core changes are needed,...  All we would have to do is fix the one SCSI 
> > > method to make it use a workqueue.
> > 
> > Try.
> 
> I did.  Didn't you see this message from Saturday:
> 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=117355959020831&w=2

Yes. In this case, silence is partial agreement. However, convincing me
is futile if Linus rejects the approach.

I wrote the original patch. But this problem must be solved. If the
first attempt is rejected, I'll try another.

> I sent it to Linus as well as to all of you.  No replies received so far 
> from anybody.
> 
> >  I don't like reverting my own code. But I predict he'll tell you that a
> > driver's bond with a device should be represented in a data structure
> > that is to be refcounted.
> 
> :-)

Coming to think about it, he might be right there.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 19:49                                     ` Oliver Neukum
@ 2007-03-12 20:03                                       ` Alan Stern
  2007-03-12 20:15                                         ` Oliver Neukum
  2007-03-12 20:31                                         ` Dmitry Torokhov
  0 siblings, 2 replies; 51+ messages in thread
From: Alan Stern @ 2007-03-12 20:03 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

On Mon, 12 Mar 2007, Oliver Neukum wrote:

> > >  I don't like reverting my own code. But I predict he'll tell you that a
> > > driver's bond with a device should be represented in a data structure
> > > that is to be refcounted.
> > 
> > :-)
> 
> Coming to think about it, he might be right there.

There still would be a synchronization problem.  Refcounts don't solve
races; they only solve lifetime problems.  And you would still have to
change the sysfs API, plus all the other stuff...

Do you think Linus would listen if all three of us (plus maybe Greg) tried 
to convince him?

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 20:03                                       ` Alan Stern
@ 2007-03-12 20:15                                         ` Oliver Neukum
  2007-03-12 20:31                                         ` Dmitry Torokhov
  1 sibling, 0 replies; 51+ messages in thread
From: Oliver Neukum @ 2007-03-12 20:15 UTC (permalink / raw)
  To: Alan Stern; +Cc: Dmitry Torokhov, Maneesh Soni, gregkh, linux-kernel

Am Montag, 12. März 2007 21:03 schrieb Alan Stern:
> On Mon, 12 Mar 2007, Oliver Neukum wrote:
> 
> > > >  I don't like reverting my own code. But I predict he'll tell you that a
> > > > driver's bond with a device should be represented in a data structure
> > > > that is to be refcounted.
> > > 
> > > :-)
> > 
> > Coming to think about it, he might be right there.
> 
> There still would be a synchronization problem.  Refcounts don't solve
> races; they only solve lifetime problems.  And you would still have to
> change the sysfs API, plus all the other stuff...
> 
> Do you think Linus would listen if all three of us (plus maybe Greg) tried 
> to convince him?

No. He'd tell you that a crap API should be changed.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 20:03                                       ` Alan Stern
  2007-03-12 20:15                                         ` Oliver Neukum
@ 2007-03-12 20:31                                         ` Dmitry Torokhov
  2007-03-12 20:45                                           ` Alan Stern
  2007-03-12 21:31                                           ` Richard Purdie
  1 sibling, 2 replies; 51+ messages in thread
From: Dmitry Torokhov @ 2007-03-12 20:31 UTC (permalink / raw)
  To: Alan Stern; +Cc: Oliver Neukum, Maneesh Soni, gregkh, linux-kernel

On 3/12/07, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Mon, 12 Mar 2007, Oliver Neukum wrote:
>
> > > >  I don't like reverting my own code. But I predict he'll tell you that a
> > > > driver's bond with a device should be represented in a data structure
> > > > that is to be refcounted.
> > >
> > > :-)
> >
> > Coming to think about it, he might be right there.
>
> There still would be a synchronization problem.  Refcounts don't solve
> races; they only solve lifetime problems.  And you would still have to
> change the sysfs API, plus all the other stuff...
>
> Do you think Linus would listen if all three of us (plus maybe Greg) tried
> to convince him?
>

If we'd accompany the argument with the patch that changes scsi to use
wq to perform deletion so we don't have deadlock regression in the
kernel he might be more perceptive... He is right about lifetime
issues but this is not strictly lifetime issue as you correctly point
out. Plus, refcounting also bloats the kernel so I don't relly want to
use refcount for every integer I happen to export through sysfs if I
can simply "revoke" access.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 20:31                                         ` Dmitry Torokhov
@ 2007-03-12 20:45                                           ` Alan Stern
  2007-03-12 21:31                                           ` Richard Purdie
  1 sibling, 0 replies; 51+ messages in thread
From: Alan Stern @ 2007-03-12 20:45 UTC (permalink / raw)
  To: Dmitry Torokhov; +Cc: Oliver Neukum, Maneesh Soni, gregkh, linux-kernel

On Mon, 12 Mar 2007, Dmitry Torokhov wrote:

> > Do you think Linus would listen if all three of us (plus maybe Greg) tried
> > to convince him?
> >
> 
> If we'd accompany the argument with the patch that changes scsi to use
> wq to perform deletion so we don't have deadlock regression in the
> kernel he might be more perceptive...

I wrote that patch over the weekend but forgot to bring it in to work.  
I'll post it tonight or tomorrow.

>  He is right about lifetime
> issues but this is not strictly lifetime issue as you correctly point
> out. Plus, refcounting also bloats the kernel so I don't relly want to
> use refcount for every integer I happen to export through sysfs if I
> can simply "revoke" access.

Agreed.

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: refcounting drivers' data structures used in sysfs buffers
  2007-03-12 20:31                                         ` Dmitry Torokhov
  2007-03-12 20:45                                           ` Alan Stern
@ 2007-03-12 21:31                                           ` Richard Purdie
  2007-03-13 15:00                                             ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
  1 sibling, 1 reply; 51+ messages in thread
From: Richard Purdie @ 2007-03-12 21:31 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Alan Stern, Oliver Neukum, Maneesh Soni, gregkh, linux-kernel

On Mon, 2007-03-12 at 16:31 -0400, Dmitry Torokhov wrote:
> On 3/12/07, Alan Stern <stern@rowland.harvard.edu> wrote:
> > On Mon, 12 Mar 2007, Oliver Neukum wrote:
> > > > >  I don't like reverting my own code. But I predict he'll tell you that a
> > > > > driver's bond with a device should be represented in a data structure
> > > > > that is to be refcounted.
> >
> > There still would be a synchronization problem.  Refcounts don't solve
> > races; they only solve lifetime problems.  And you would still have to
> > change the sysfs API, plus all the other stuff...
> >
> > Do you think Linus would listen if all three of us (plus maybe Greg) tried
> > to convince him?
> >
> 
> If we'd accompany the argument with the patch that changes scsi to use
> wq to perform deletion so we don't have deadlock regression in the
> kernel he might be more perceptive... He is right about lifetime
> issues but this is not strictly lifetime issue as you correctly point
> out. Plus, refcounting also bloats the kernel so I don't relly want to
> use refcount for every integer I happen to export through sysfs if I
> can simply "revoke" access.

For what its worth, I think it makes sense if the driver no longer has
to worry about sysfs attributes after they've been removed. This is
something the core should look after, not each and every driver.

http://marc.theaimsgroup.com/?l=linux-kernel&m=117355959020831&w=2

makes a lot of sense, particularly that "No driver callbacks occur after
unregistration". When writing the backlight class code, I remember
checking into this, concluding that seemed to be the design of sysfs and
thinking it a sane design.

The alternative is to force each and every driver to do its own
refcounting. My experience with locking in the extremely simple
backlight class shows nobody reads the documentation or writes the code
correctly. With that, I've given up and added suitable locking to the
core even if not every driver needs it. In doing so, I made a net
removal of a few hundred lines of broken "ticking timebomb" style code.
I dread to think what would happen if every driver had to deal with
sysfs refcounting.

So count me as a vote for handling this in the sysfs core, not the
drivers.

Richard



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-12 21:31                                           ` Richard Purdie
@ 2007-03-13 15:00                                             ` Alan Stern
  2007-03-13 18:42                                               ` Cornelia Huck
  2007-03-13 19:00                                               ` 2.6.21-rc suspend regression: sysfs deadlock Hugh Dickins
  0 siblings, 2 replies; 51+ messages in thread
From: Alan Stern @ 2007-03-13 15:00 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue. 6 Mar 2007, Hugh Dickins wrote:

> But suspend to RAM still hanging, unless I "chmod a-x /usr/sbin/docker"
> on SuSE 10.2: docker undock tries to unregister /sys/block/sr0 and hangs:
> 
> 60x60         D B0415080     0 10778  10771                     (NOTLB)
>        e8227e04 00000086 e80c60b0 b0415080 ef3f5454 b041dc20 ef3f5430 00000001 
>        e80c60b0 72af360e 00000085 00001941 e80c61bc e8227e00 b01606bf ef47d3c0 
>        ed07c1dc ed07c1e4 00000246 e8227e30 b02f6ef0 e80c60b0 00000001 e80c60b0 
> Call Trace:
>  [<b02f6ef0>] __down+0xaa/0xb8
>  [<b02f6de6>] __down_failed+0xa/0x10
>  [<b0180529>] sysfs_drop_dentry+0xa2/0xda
>  [<b01819b3>] __sysfs_remove_dir+0x6d/0xf8
>  [<b0181a53>] sysfs_remove_dir+0x15/0x20
>  [<b01d49a9>] kobject_del+0x16/0x22
>  [<b0230041>] device_del+0x1c9/0x1e2
>  [<b025705a>] __scsi_remove_device+0x43/0x7a
>  [<b02570b0>] scsi_remove_device+0x1f/0x2b
>  [<b0256a44>] sdev_store_delete+0x16/0x1b
>  [<b022f0a0>] dev_attr_store+0x32/0x34
>  [<b0180931>] flush_write_buffer+0x37/0x3d
>  [<b0180995>] sysfs_write_file+0x5e/0x82
>  [<b01507f5>] vfs_write+0xa7/0x150
>  [<b0150950>] sys_write+0x47/0x6b
>  [<b0103d56>] sysenter_past_esp+0x5f/0x85
>               /usr/lib/dockutils/hooks/thinkpad/60x60 undock
>               /usr/lib/dockutils/dockhandler undock
>               /usr/sbin/docker undock
>               /etc/pm/hooks/23dock suspend
> 
> This comes from Oliver's commit 94bebf4d1b8e7719f0f3944c037a21cfd99a4af7
> Driver core: fix race in sysfs between sysfs_remove_file() and read()/write()
> in 2.6.21-rc1.  It looks to me like sysfs_write_file downs buffer->sem
> while calling flush_write_buffer, and flushing that particular write
> buffer entails downing buffer->sem in orphan_all_buffers.
> 
> Suspend no longer deadlocks with the following silly patch, but I expect
> this either pokes a small hole in your scheme, or renders it pointless.
> Maybe that commit needs to be reverted, or maybe you can see how to fix
> it up for -rc3.
> 
> Thanks,
> Hugh
> 
> --- 2.6.21-rc2-git5/fs/sysfs/inode.c	2007-02-28 08:30:26.000000000 
> +0000
> +++ linux/fs/sysfs/inode.c	2007-03-06 18:03:13.000000000 +0000
> @@ -227,11 +227,8 @@ static inline void orphan_all_buffers(st
>  
>  	mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
>  	if (node->i_private) {
> -		list_for_each_entry(buf, &set->associates, associates) {
> -			down(&buf->sem);
> +		list_for_each_entry(buf, &set->associates, associates)
>  			buf->orphaned = 1;
> -			up(&buf->sem);
> -		}
>  	}
>  	mutex_unlock(&node->i_mutex);
>  }

Hugh, there has been a long discussion among several people concerning 
this issue.  See for example this thread:

http://marc.info/?t=117335935200001&r=1&w=2

and also:

http://marc.info/?l=linux-kernel&m=117355959020831&w=2

The consensus is that we would be better off keeping Oliver's original 
patch without your silly change, and instead fixing the particular method 
call that deadlocked.  Can you please try out the patch below with 
everything else as it was before?  It should solve your problem.

Alan Stern


Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -452,10 +452,39 @@ store_rescan_field (struct device *dev, 
 }
 static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
+/* An attribute method cannot unregister itself, so this workaround for
+ * sdev_store_delete() is necessary.
+ */
+struct sdev_work_struct {
+	struct scsi_device *sdev;
+	struct work_struct work;
+};
+
+static void sdev_store_delete_work(struct work_struct *work)
+{
+	struct sdev_work_struct *sdw = container_of(work,
+			struct sdev_work_struct, work);
+
+	scsi_remove_device(sdw->sdev);
+	scsi_device_put(sdw->sdev);
+	kfree(sdw);
+}
+
 static ssize_t sdev_store_delete(struct device *dev, struct device_attribute *attr, const char *buf,
 				 size_t count)
 {
-	scsi_remove_device(to_scsi_device(dev));
+	struct scsi_device *sdev = to_scsi_device(dev);
+	struct sdev_work_struct *sdw;
+
+	sdw = kmalloc(sizeof(*sdw), GFP_KERNEL);
+	if (!sdw)
+		return -ENOMEM;
+	sdw->sdev = sdev;
+	INIT_WORK(&sdw->work, sdev_store_delete_work);
+	if (scsi_device_get(sdev) != 0)
+		kfree(sdw);
+	else
+		schedule_work(&sdw->work);
 	return count;
 };
 static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 15:00                                             ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
@ 2007-03-13 18:42                                               ` Cornelia Huck
  2007-03-13 21:20                                                 ` Linus Torvalds
  2007-03-13 19:00                                               ` 2.6.21-rc suspend regression: sysfs deadlock Hugh Dickins
  1 sibling, 1 reply; 51+ messages in thread
From: Cornelia Huck @ 2007-03-13 18:42 UTC (permalink / raw)
  To: Alan Stern
  Cc: Hugh Dickins, Dmitry Torokhov, Oliver Neukum, Maneesh Soni,
	gregkh, Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007 11:00:21 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> The consensus is that we would be better off keeping Oliver's original 
> patch without your silly change, and instead fixing the particular method 
> call that deadlocked.

Another call that deadlocked with Oliver's patch is ungroup for s390
ccwgroup devices. It can be made to work again with a similar patch.

Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>

---
 drivers/s390/cio/ccwgroup.c |   35 +++++++++++++++++++++++++++++++----
 1 file changed, 31 insertions(+), 4 deletions(-)

--- linux-2.6.orig/drivers/s390/cio/ccwgroup.c
+++ linux-2.6/drivers/s390/cio/ccwgroup.c
@@ -67,22 +67,49 @@ __ccwgroup_remove_symlinks(struct ccwgro
 	
 }
 
+struct ccwgroup_work_struct {
+	struct ccwgroup_device *gdev;
+	struct work_struct work;
+};
+
+static void ccwgroup_ungroup_work(struct work_struct *work)
+{
+	struct ccwgroup_work_struct *ungroup_work
+		= container_of(work, struct ccwgroup_work_struct, work);
+
+	__ccwgroup_remove_symlinks(ungroup_work->gdev);
+	device_unregister(&ungroup_work->gdev->dev);
+	put_device(&ungroup_work->gdev->dev);
+	kfree(ungroup_work);
+}
+
 /*
  * Provide an 'ungroup' attribute so the user can remove group devices no
  * longer needed or accidentially created. Saves memory :)
+ * Note that we cannot unregister the device from one of its attribute
+ * methods, so we have to delay it.
  */
-static ssize_t
-ccwgroup_ungroup_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
+static ssize_t ccwgroup_ungroup_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
 {
 	struct ccwgroup_device *gdev;
+	struct ccwgroup_work_struct *ungroup_work;
 
 	gdev = to_ccwgroupdev(dev);
 
 	if (gdev->state != CCWGROUP_OFFLINE)
 		return -EINVAL;
 
-	__ccwgroup_remove_symlinks(gdev);
-	device_unregister(dev);
+	ungroup_work = kmalloc(sizeof(*ungroup_work), GFP_KERNEL);
+	if (!ungroup_work)
+		return -ENOMEM;
+	ungroup_work->gdev = gdev;
+	INIT_WORK(&ungroup_work->work, ccwgroup_ungroup_work);
+	if (!get_device(&gdev->dev))
+		kfree(ungroup_work);
+	else
+		schedule_work(&ungroup_work->work);
 
 	return count;
 }

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 15:00                                             ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
  2007-03-13 18:42                                               ` Cornelia Huck
@ 2007-03-13 19:00                                               ` Hugh Dickins
  2007-03-13 20:09                                                 ` Alan Stern
  1 sibling, 1 reply; 51+ messages in thread
From: Hugh Dickins @ 2007-03-13 19:00 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Alan Stern wrote:
> 
> The consensus is that we would be better off keeping Oliver's original 
> patch without your silly change, and instead fixing the particular method 
> call that deadlocked.  Can you please try out the patch below with 
> everything else as it was before?  It should solve your problem.

Yep, it works fine with your patch in and my silly reverted, thanks.
But (I was about to say, even before seeing Cornelia's reply, honest!)
I think you do need to check (audit the source? or is some runtime
check possible?) for other such "suicidal" sysfs files, which
seemed to (sysfs-ignorant) me to pose the real problem.

Hugh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 19:00                                               ` 2.6.21-rc suspend regression: sysfs deadlock Hugh Dickins
@ 2007-03-13 20:09                                                 ` Alan Stern
  2007-03-13 20:55                                                   ` Hugh Dickins
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-13 20:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Hugh Dickins wrote:

> On Tue, 13 Mar 2007, Alan Stern wrote:
> > 
> > The consensus is that we would be better off keeping Oliver's original 
> > patch without your silly change, and instead fixing the particular method 
> > call that deadlocked.  Can you please try out the patch below with 
> > everything else as it was before?  It should solve your problem.
> 
> Yep, it works fine with your patch in and my silly reverted, thanks.
> But (I was about to say, even before seeing Cornelia's reply, honest!)
> I think you do need to check (audit the source? or is some runtime
> check possible?) for other such "suicidal" sysfs files, which
> seemed to (sysfs-ignorant) me to pose the real problem.

A runtime check wouldn't detect anything until someone tried to use the 
file -- at which point the process would deadlock anyway.

On the other hand, a quick survey of the kernel source shows that
DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
for the faint-of-heart!

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 20:09                                                 ` Alan Stern
@ 2007-03-13 20:55                                                   ` Hugh Dickins
  2007-03-13 21:08                                                     ` Dmitry Torokhov
  2007-03-13 21:20                                                     ` Alan Stern
  0 siblings, 2 replies; 51+ messages in thread
From: Hugh Dickins @ 2007-03-13 20:55 UTC (permalink / raw)
  To: Alan Stern
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Alan Stern wrote:
> 
> On the other hand, a quick survey of the kernel source shows that
> DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> for the faint-of-heart!

Indeed, and faint-hearted Hugh wasn't intending to do so: but
stout-hearted Alan will need to, won't he, before his patch can go in?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 20:55                                                   ` Hugh Dickins
@ 2007-03-13 21:08                                                     ` Dmitry Torokhov
  2007-03-13 21:20                                                     ` Alan Stern
  1 sibling, 0 replies; 51+ messages in thread
From: Dmitry Torokhov @ 2007-03-13 21:08 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Stern, Oliver Neukum, Maneesh Soni, gregkh, Richard Purdie,
	James Bottomley, Linus Torvalds, Kernel development list

On 3/13/07, Hugh Dickins <hugh@veritas.com> wrote:
> On Tue, 13 Mar 2007, Alan Stern wrote:
> >
> > On the other hand, a quick survey of the kernel source shows that
> > DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> > for the faint-of-heart!
>
> Indeed, and faint-hearted Hugh wasn't intending to do so: but
> stout-hearted Alan will need to, won't he, before his patch can go in?
>

I think we could rely on subsystems maintainers to let us know if
there are potential problems. For example I can tell that neither
input, serio nor gameport subsystems use sysfs to destroy their
devices (action on sysfs may cause some other device to be destroyed
but that should be ok, only self-destruction is not allowed, right?)

-- 
Dmitry

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 18:42                                               ` Cornelia Huck
@ 2007-03-13 21:20                                                 ` Linus Torvalds
  2007-03-14 16:12                                                   ` Alan Stern
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2007-03-13 21:20 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alan Stern, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list



On Tue, 13 Mar 2007, Cornelia Huck wrote:
> 
> Another call that deadlocked with Oliver's patch is ungroup for s390
> ccwgroup devices. It can be made to work again with a similar patch.

Could we please make this easier to use by having some common sysfs helper 
routine for this kind of "delayed_store()" functionality.

I'm not a huge fan of delayed work at all, but if we have to have it, at 
least make it one generic function rather than having multiple functions 
all doing their own workqueue logic for it.

		Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 20:55                                                   ` Hugh Dickins
  2007-03-13 21:08                                                     ` Dmitry Torokhov
@ 2007-03-13 21:20                                                     ` Alan Stern
  1 sibling, 0 replies; 51+ messages in thread
From: Alan Stern @ 2007-03-13 21:20 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dmitry Torokhov, Oliver Neukum, Maneesh Soni, gregkh,
	Richard Purdie, James Bottomley, Linus Torvalds,
	Kernel development list

On Tue, 13 Mar 2007, Hugh Dickins wrote:

> On Tue, 13 Mar 2007, Alan Stern wrote:
> > 
> > On the other hand, a quick survey of the kernel source shows that
> > DEVICE_ATTR is used over 1500 times.  Auditing all of them is not a job
> > for the faint-of-heart!
> 
> Indeed, and faint-hearted Hugh wasn't intending to do so: but
> stout-hearted Alan will need to, won't he, before his patch can go in?

Allow me to point out that the original patch is Oliver's (although I
helped), and it doesn't need to go in -- it needs not to be removed.

Furthermore, I have better things to do with the next month of my time 
than auditing hundreds of routines I don't understand for behavior I 
probably won't be able to recognize.  (Although at 50 a day... hmmm, 
maybe.)

This sounds more like a job for kernel-janitors!


On Tue, 13 Mar 2007, Dmitry Torokhov wrote:

> I think we could rely on subsystems maintainers to let us know if
> there are potential problems. For example I can tell that neither
> input, serio nor gameport subsystems use sysfs to destroy their  
> devices (action on sysfs may cause some other device to be destroyed
> but that should be ok, only self-destruction is not allowed, right?)

Very good points.  USB doesn't do anything like that either.  And right, 
it's okay for a method to destroy other devices; it just can't do anything 
that would lead to its own unregistration.

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-13 21:20                                                 ` Linus Torvalds
@ 2007-03-14 16:12                                                   ` Alan Stern
  2007-03-14 18:43                                                     ` Cornelia Huck
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-14 16:12 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Tue, 13 Mar 2007, Linus Torvalds wrote:

> Could we please make this easier to use by having some common sysfs helper 
> routine for this kind of "delayed_store()" functionality.
> 
> I'm not a huge fan of delayed work at all, but if we have to have it, at 
> least make it one generic function rather than having multiple functions 
> all doing their own workqueue logic for it.

This seems more elegant (not yet tested).  Cornelia, does it look okay to 
you?

Alan Stern


Index: usb-2.6/include/linux/sysfs.h
===================================================================
--- usb-2.6.orig/include/linux/sysfs.h
+++ usb-2.6/include/linux/sysfs.h
@@ -78,6 +78,9 @@ struct sysfs_ops {
 
 #ifdef CONFIG_SYSFS
 
+extern int sysfs_access_in_other_task(struct kobject *kobj,
+		void (*func)(void *), void *data);
+
 extern int __must_check
 sysfs_create_dir(struct kobject *, struct dentry *);
 
@@ -133,6 +136,12 @@ extern int __must_check sysfs_init(void)
 
 #else /* CONFIG_SYSFS */
 
+static inline int sysfs_access_in_other_task(struct kobject *kobj,
+		void (*func)(void *), void *data)
+{
+	return -ENOSYS;
+}
+
 static inline int sysfs_create_dir(struct kobject * k, struct dentry *shadow)
 {
 	return 0;
Index: usb-2.6/fs/sysfs/file.c
===================================================================
--- usb-2.6.orig/fs/sysfs/file.c
+++ usb-2.6/fs/sysfs/file.c
@@ -643,6 +643,59 @@ void sysfs_remove_file_from_group(struct
 }
 EXPORT_SYMBOL_GPL(sysfs_remove_file_from_group);
 
+struct other_task_struct {
+	struct kobject 		*kobj;
+	void			(*func)(void *);
+	void			*data;
+	struct work_struct	work;
+};
+
+static void other_task_work(struct work_struct *work)
+{
+	struct other_task_struct *ots = container_of(work,
+			struct other_task_struct, work);
+
+	(ots->func)(ots->data);
+	kobject_put(ots->kobj);
+	kfree(ots);
+}
+
+/**
+ * sysfs_access_in_other_task - delay access from an attribute method.
+ * @kobj: object we're acting for.
+ * @func: callback function to invoke later.
+ * @data: argument to pass to @func.
+ *
+ * sysfs attribute methods must not unregister themselves or their parent
+ * kobject (which would amount to the same thing).  Attempts to do so will
+ * deadlock, since unregistration is mutually exclusive with driver
+ * callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to carry out the requested function
+ * in the workqueue's process context.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ */
+int sysfs_access_in_other_task(struct kobject *kobj, void (*func)(void *),
+		void *data)
+{
+	struct other_task_struct *ots;
+
+	ots = kmalloc(sizeof(*ots), GFP_KERNEL);
+	if (!ots)
+		return -ENOMEM;
+	kobject_get(kobj);
+	ots->kobj = kobj;
+	ots->func = func;
+	ots->data = data;
+	INIT_WORK(&ots->work, other_task_work);
+	schedule_work(&ots->work);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sysfs_access_in_other_task);
+
 
 EXPORT_SYMBOL_GPL(sysfs_create_file);
 EXPORT_SYMBOL_GPL(sysfs_remove_file);
Index: usb-2.6/include/linux/device.h
===================================================================
--- usb-2.6.orig/include/linux/device.h
+++ usb-2.6/include/linux/device.h
@@ -356,6 +356,8 @@ extern int __must_check device_create_bi
 					       struct bin_attribute *attr);
 extern void device_remove_bin_file(struct device *dev,
 				   struct bin_attribute *attr);
+extern int device_access_in_other_task(struct device *dev,
+		void (*func)(struct device *));
 
 /* device resource management */
 typedef void (*dr_release_t)(struct device *dev, void *res);
Index: usb-2.6/drivers/base/core.c
===================================================================
--- usb-2.6.orig/drivers/base/core.c
+++ usb-2.6/drivers/base/core.c
@@ -407,6 +407,30 @@ void device_remove_bin_file(struct devic
 }
 EXPORT_SYMBOL_GPL(device_remove_bin_file);
 
+/**
+ * device_access_in_other_task - delay access from an attribute method.
+ * @dev: device.
+ * @func: callback function to invoke later.
+ *
+ * Attribute methods must not unregister themselves or their parent device
+ * (which would amount to the same thing).  Attempts to do so will deadlock,
+ * since unregistration is mutually exclusive with driver callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to carry out the requested function
+ * in the workqueue's process context.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ */
+int device_access_in_other_task(struct device *dev,
+		void (*func)(struct device *))
+{
+	return sysfs_access_in_other_task(&dev->kobj,
+			(void (*)(void *)) func, dev);
+}
+EXPORT_SYMBOL_GPL(device_access_in_other_task);
+
 static void klist_children_get(struct klist_node *n)
 {
 	struct device *dev = container_of(n, struct device, knode_parent);
Index: usb-2.6/drivers/scsi/scsi_sysfs.c
===================================================================
--- usb-2.6.orig/drivers/scsi/scsi_sysfs.c
+++ usb-2.6/drivers/scsi/scsi_sysfs.c
@@ -452,10 +452,22 @@ store_rescan_field (struct device *dev, 
 }
 static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
+static void sdev_store_delete_callback(struct device *dev)
+{
+	scsi_remove_device(to_scsi_device(dev));
+}
+
 static ssize_t sdev_store_delete(struct device *dev, struct device_attribute *attr, const char *buf,
 				 size_t count)
 {
-	scsi_remove_device(to_scsi_device(dev));
+	int rc;
+
+	/* An attribute cannot be unregistered by one of its own methods,
+	 * so we have to use device_access_in_other_task().
+	 */
+	rc = device_access_in_other_task(dev, sdev_store_delete_callback);
+	if (rc)
+		count = rc;
 	return count;
 };
 static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);
Index: usb-2.6/drivers/s390/cio/ccwgroup.c
===================================================================
--- usb-2.6.orig/drivers/s390/cio/ccwgroup.c
+++ usb-2.6/drivers/s390/cio/ccwgroup.c
@@ -71,19 +71,31 @@ __ccwgroup_remove_symlinks(struct ccwgro
  * Provide an 'ungroup' attribute so the user can remove group devices no
  * longer needed or accidentially created. Saves memory :)
  */
+static void ccwgroup_ungroup_callback(struct device *dev)
+{
+	struct ccwgroup_device *gdev = to_ccwgroupdev(dev);
+
+	__ccwgroup_remove_symlinks(gdev);
+	device_unregister(dev);
+}
+
 static ssize_t
 ccwgroup_ungroup_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
 {
 	struct ccwgroup_device *gdev;
+	int rc;
 
 	gdev = to_ccwgroupdev(dev);
 
 	if (gdev->state != CCWGROUP_OFFLINE)
 		return -EINVAL;
 
-	__ccwgroup_remove_symlinks(gdev);
-	device_unregister(dev);
-
+	/* Note that we cannot unregister the device from one of its
+	 * attribute methods, so we have to delay it.
+	 */
+	rc = device_access_in_other_task(dev, ccwgroup_ungroup_callback);
+	if (rc)
+		count = rc;
 	return count;
 }
 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-14 16:12                                                   ` Alan Stern
@ 2007-03-14 18:43                                                     ` Cornelia Huck
  2007-03-14 19:23                                                       ` Alan Stern
  0 siblings, 1 reply; 51+ messages in thread
From: Cornelia Huck @ 2007-03-14 18:43 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Wed, 14 Mar 2007 12:12:37 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> This seems more elegant (not yet tested).  Cornelia, does it look okay to 
> you?

Works for me (grouping & ungrouping ctc) and looks sane. Some more
comments below.


> +struct other_task_struct {
> +	struct kobject 		*kobj;
> +	void			(*func)(void *);
> +	void			*data;
> +	struct work_struct	work;
> +};
> +
> +static void other_task_work(struct work_struct *work)
> +{
> +	struct other_task_struct *ots = container_of(work,
> +			struct other_task_struct, work);
> +
> +	(ots->func)(ots->data);
> +	kobject_put(ots->kobj);
> +	kfree(ots);
> +}

The naming seems a bit unintuitive, but I don't have a good
alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?

> +
> +/**
> + * sysfs_access_in_other_task - delay access from an attribute method.
> + * @kobj: object we're acting for.
> + * @func: callback function to invoke later.
> + * @data: argument to pass to @func.
> + *
> + * sysfs attribute methods must not unregister themselves or their parent
> + * kobject (which would amount to the same thing).  Attempts to do so will
> + * deadlock, since unregistration is mutually exclusive with driver
> + * callbacks.
> + *
> + * Instead methods can call this routine, which will attempt to allocate
> + * and schedule a workqueue request to carry out the requested function
> + * in the workqueue's process context.
> + *
> + * Returns 0 if the request was submitted, -ENOMEM if storage could not
> + * be allocated.
> + */
> +int sysfs_access_in_other_task(struct kobject *kobj, void (*func)(void *),
> +		void *data)

sysfs_delay_access()?


> +/**
> + * device_access_in_other_task - delay access from an attribute method.
> + * @dev: device.
> + * @func: callback function to invoke later.
> + *
> + * Attribute methods must not unregister themselves or their parent device
> + * (which would amount to the same thing).  Attempts to do so will deadlock,
> + * since unregistration is mutually exclusive with driver callbacks.
> + *
> + * Instead methods can call this routine, which will attempt to allocate
> + * and schedule a workqueue request to carry out the requested function
> + * in the workqueue's process context.
> + *
> + * Returns 0 if the request was submitted, -ENOMEM if storage could not
> + * be allocated.
> + */
> +int device_access_in_other_task(struct device *dev,
> +		void (*func)(struct device *))
> +{
> +	return sysfs_access_in_other_task(&dev->kobj,
> +			(void (*)(void *)) func, dev);
> +}
> +EXPORT_SYMBOL_GPL(device_access_in_other_task);

device_delay_access()?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-14 18:43                                                     ` Cornelia Huck
@ 2007-03-14 19:23                                                       ` Alan Stern
  2007-03-15 10:27                                                         ` Cornelia Huck
  0 siblings, 1 reply; 51+ messages in thread
From: Alan Stern @ 2007-03-14 19:23 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Wed, 14 Mar 2007, Cornelia Huck wrote:

> On Wed, 14 Mar 2007 12:12:37 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
> 
> > This seems more elegant (not yet tested).  Cornelia, does it look okay to 
> > you?
> 
> Works for me (grouping & ungrouping ctc) and looks sane. Some more
> comments below.

Thank you.

> > +struct other_task_struct {
> > +	struct kobject 		*kobj;
> > +	void			(*func)(void *);
> > +	void			*data;
> > +	struct work_struct	work;
> > +};
> > +
> > +static void other_task_work(struct work_struct *work)
> > +{
> > +	struct other_task_struct *ots = container_of(work,
> > +			struct other_task_struct, work);
> > +
> > +	(ots->func)(ots->data);
> > +	kobject_put(ots->kobj);
> > +	kfree(ots);
> > +}
> 
> The naming seems a bit unintuitive, but I don't have a good
> alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?

sysfs_work_struct is too generic; other parts of sysfs might also want to
use workqueues for different purposes.

I don't like calling it "delayed"-anything, because the operations aren't
necessarily delayed!  On an SMP system they might even execute before the
sysfs_access_in_other_task() call returns.  (Although the two examples we
have so far can't do that because of lock contention.)

The major feature added here is that the work takes place in a different 
task's context, not that it is delayed.  Hence the choice of names.

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-14 19:23                                                       ` Alan Stern
@ 2007-03-15 10:27                                                         ` Cornelia Huck
  2007-03-15 12:31                                                           ` Hugh Dickins
  2007-03-15 14:27                                                           ` Alan Stern
  0 siblings, 2 replies; 51+ messages in thread
From: Cornelia Huck @ 2007-03-15 10:27 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Wed, 14 Mar 2007 15:23:10 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> > > +struct other_task_struct {
> > > +	struct kobject 		*kobj;
> > > +	void			(*func)(void *);
> > > +	void			*data;
> > > +	struct work_struct	work;
> > > +};
> > > +
> > > +static void other_task_work(struct work_struct *work)
> > > +{
> > > +	struct other_task_struct *ots = container_of(work,
> > > +			struct other_task_struct, work);
> > > +
> > > +	(ots->func)(ots->data);
> > > +	kobject_put(ots->kobj);
> > > +	kfree(ots);
> > > +}
> > 
> > The naming seems a bit unintuitive, but I don't have a good
> > alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?
> 
> sysfs_work_struct is too generic; other parts of sysfs might also want to
> use workqueues for different purposes.

> I don't like calling it "delayed"-anything, because the operations aren't
> necessarily delayed!  On an SMP system they might even execute before the
> sysfs_access_in_other_task() call returns.  (Although the two examples we
> have so far can't do that because of lock contention.)

Sure. But then you shouldn't refer to "delay" in the comments for the
functions as well :)

> The major feature added here is that the work takes place in a different 
> task's context, not that it is delayed.  Hence the choice of names.

Hm. Perhaps device_schedule_access()?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 10:27                                                         ` Cornelia Huck
@ 2007-03-15 12:31                                                           ` Hugh Dickins
  2007-03-15 13:02                                                             ` Oliver Neukum
  2007-03-15 14:27                                                           ` Alan Stern
  1 sibling, 1 reply; 51+ messages in thread
From: Hugh Dickins @ 2007-03-15 12:31 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Alan Stern, Linus Torvalds, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Cornelia Huck wrote:
> On Wed, 14 Mar 2007 15:23:10 -0400 (EDT),
> Alan Stern <stern@rowland.harvard.edu> wrote:
> > 
> > sysfs_work_struct is too generic; other parts of sysfs might also want to
> > use workqueues for different purposes.
> 
> > I don't like calling it "delayed"-anything, because the operations aren't
> > necessarily delayed!  On an SMP system they might even execute before the
> > sysfs_access_in_other_task() call returns.  (Although the two examples we
> > have so far can't do that because of lock contention.)
> 
> Sure. But then you shouldn't refer to "delay" in the comments for the
> functions as well :)
> 
> > The major feature added here is that the work takes place in a different 
> > task's context, not that it is delayed.  Hence the choice of names.
> 
> Hm. Perhaps device_schedule_access()?

It's really none of my business, I'm merely the reporter the
deadlock being fixed, and I don't know my way around sysfs at all ...

... but I have to say I share your discomfort with Alan's
"sysfs_access_in_other_task" naming, it sounded very weird to me.

Quite apart from this mysterious "other task", I don't understand
"access" either.

Perhaps "defer" would best capture the idea of another-task and
maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?

Hugh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 12:31                                                           ` Hugh Dickins
@ 2007-03-15 13:02                                                             ` Oliver Neukum
  2007-03-15 13:22                                                               ` Dmitry Torokhov
  0 siblings, 1 reply; 51+ messages in thread
From: Oliver Neukum @ 2007-03-15 13:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Cornelia Huck, Alan Stern, Linus Torvalds, Dmitry Torokhov,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

Am Donnerstag, 15. März 2007 13:31 schrieb Hugh Dickins:
> Quite apart from this mysterious "other task", I don't understand
> "access" either.
> 
> Perhaps "defer" would best capture the idea of another-task and
> maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?

But we do not wish to defer or delay anything.
How about: sysfs_action_from_neutral_context

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 13:02                                                             ` Oliver Neukum
@ 2007-03-15 13:22                                                               ` Dmitry Torokhov
  2007-03-15 13:59                                                                 ` Hugh Dickins
  0 siblings, 1 reply; 51+ messages in thread
From: Dmitry Torokhov @ 2007-03-15 13:22 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Hugh Dickins, Cornelia Huck, Alan Stern, Linus Torvalds,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On 3/15/07, Oliver Neukum <oneukum@suse.de> wrote:
> Am Donnerstag, 15. März 2007 13:31 schrieb Hugh Dickins:
> > Quite apart from this mysterious "other task", I don't understand
> > "access" either.
> >
> > Perhaps "defer" would best capture the idea of another-task and
> > maybe-delay? sysfs_defer_work(), struct sysfs_deferred_work?
>
> But we do not wish to defer or delay anything.
> How about: sysfs_action_from_neutral_context
>

How about sysfs_schedule_work? That is what it does - schedules a work
on a sysfs object and everyone here knows what schedule_work() does.

-- 
Dmitry

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 13:22                                                               ` Dmitry Torokhov
@ 2007-03-15 13:59                                                                 ` Hugh Dickins
  0 siblings, 0 replies; 51+ messages in thread
From: Hugh Dickins @ 2007-03-15 13:59 UTC (permalink / raw)
  To: Dmitry Torokhov
  Cc: Oliver Neukum, Cornelia Huck, Alan Stern, Linus Torvalds,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Dmitry Torokhov wrote:
> 
> How about sysfs_schedule_work? That is what it does - schedules a work
> on a sysfs object and everyone here knows what schedule_work() does.

I'm ashamed to have suggested anything else: certainly gets my vote.

Hugh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 10:27                                                         ` Cornelia Huck
  2007-03-15 12:31                                                           ` Hugh Dickins
@ 2007-03-15 14:27                                                           ` Alan Stern
  2007-03-15 15:32                                                             ` Cornelia Huck
  2007-03-15 16:29                                                             ` Hugh Dickins
  1 sibling, 2 replies; 51+ messages in thread
From: Alan Stern @ 2007-03-15 14:27 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Cornelia Huck wrote:

> > > The naming seems a bit unintuitive, but I don't have a good
> > > alternative idea. Perhaps sysfs_work_struct, sysfs_delayed_work()?
> > 
> > sysfs_work_struct is too generic; other parts of sysfs might also want to
> > use workqueues for different purposes.
> 
> > I don't like calling it "delayed"-anything, because the operations aren't
> > necessarily delayed!  On an SMP system they might even execute before the
> > sysfs_access_in_other_task() call returns.  (Although the two examples we
> > have so far can't do that because of lock contention.)
> 
> Sure. But then you shouldn't refer to "delay" in the comments for the
> functions as well :)

Fair enough.  One use of "delay" is in a comment you wrote; I'll change it 
as well.

> > The major feature added here is that the work takes place in a different 
> > task's context, not that it is delayed.  Hence the choice of names.
> 
> Hm. Perhaps device_schedule_access()?

On Thu, 15 Mar 2007, Hugh Dickins wrote:

> It's really none of my business, I'm merely the reporter the
> deadlock being fixed, and I don't know my way around sysfs at all ...
> 
> ... but I have to say I share your discomfort with Alan's
> "sysfs_access_in_other_task" naming, it sounded very weird to me.
> 
> Quite apart from this mysterious "other task", I don't understand
> "access" either.
> 
> Perhaps "defer" would best capture the idea of another-task and
> maybe-delay?  sysfs_defer_work(), struct sysfs_deferred_work?  

On Thu, 15 Mar 2007, Oliver Neukum wrote:

> But we do not wish to defer or delay anything.
> How about: sysfs_action_from_neutral_context  

On Thu, 15 Mar 2007, Dmitry Torokhov wrote:

> How about sysfs_schedule_work? That is what it does - schedules a work
> on a sysfs object and everyone here knows what schedule_work() does.  

On Thu, 15 Mar 2007, Hugh Dickins wrote:

> I'm ashamed to have suggested anything else: certainly gets my vote.

Personally I don't understand what was wrong with my name.  What's weird 
or unintuitive about doing something in a different task's context?

Dmitry's suggestion is slightly inappropriate because the function doesn't
take a workstruct as an argument and it isn't itself a workqueue callback.  

Would people be happier with sysfs_schedule_callback() and
device_schedule_callback()?  At least the functions do take a callback 
pointer as an argument, even though they aren't callbacks themselves.

Alan Stern


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 14:27                                                           ` Alan Stern
@ 2007-03-15 15:32                                                             ` Cornelia Huck
  2007-03-15 16:29                                                             ` Hugh Dickins
  1 sibling, 0 replies; 51+ messages in thread
From: Cornelia Huck @ 2007-03-15 15:32 UTC (permalink / raw)
  To: Alan Stern
  Cc: Linus Torvalds, Hugh Dickins, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007 10:27:19 -0400 (EDT),
Alan Stern <stern@rowland.harvard.edu> wrote:

> Fair enough.  One use of "delay" is in a comment you wrote; I'll change it 
> as well.

Fine with me.

> Would people be happier with sysfs_schedule_callback() and
> device_schedule_callback()?  At least the functions do take a callback 
> pointer as an argument, even though they aren't callbacks themselves.

Count one happy person here.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 14:27                                                           ` Alan Stern
  2007-03-15 15:32                                                             ` Cornelia Huck
@ 2007-03-15 16:29                                                             ` Hugh Dickins
  2007-03-15 16:51                                                               ` Linus Torvalds
  1 sibling, 1 reply; 51+ messages in thread
From: Hugh Dickins @ 2007-03-15 16:29 UTC (permalink / raw)
  To: Alan Stern
  Cc: Cornelia Huck, Linus Torvalds, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

On Thu, 15 Mar 2007, Alan Stern wrote:
> 
> Personally I don't understand what was wrong with my name.  What's weird 
> or unintuitive about doing something in a different task's context?

The only thing wrong with sysfs_do_something_in_a_different_task_context()
is the length of the name.  "do", that's good, much better than "access".

sysfs_access_in_other_task() left me wondering what this "other" task
was, and what kind of "access" it's trying to get - or is the calling
task the other, and it's trying to access something it wouldn't
directly have access to?

> 
> Dmitry's suggestion is slightly inappropriate because the function doesn't
> take a workstruct as an argument and it isn't itself a workqueue callback.  

True, though since he's saying "work" rather than "workstruct",
I was okay with that: it's a sysfs wrapper to schedule_work().

> 
> Would people be happier with sysfs_schedule_callback() and
> device_schedule_callback()?  At least the functions do take a callback 
> pointer as an argument, even though they aren't callbacks themselves.

A lot happier than with sysfs_access_in_other_task() -
if you prefer this to Dmitry's, it's okay by me.

Hugh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: 2.6.21-rc suspend regression: sysfs deadlock
  2007-03-15 16:29                                                             ` Hugh Dickins
@ 2007-03-15 16:51                                                               ` Linus Torvalds
  2007-03-15 19:50                                                                 ` [PATCH] sysfs and driver core: add callback helper, used by SCSI and S390 Alan Stern
  2007-03-15 19:51                                                                 ` [PATCH] sysfs: reinstate exclusion between method calls and attribute unregistration Alan Stern
  0 siblings, 2 replies; 51+ messages in thread
From: Linus Torvalds @ 2007-03-15 16:51 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Stern, Cornelia Huck, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list



On Thu, 15 Mar 2007, Hugh Dickins wrote:
> 
> sysfs_access_in_other_task() left me wondering what this "other" task
> was, and what kind of "access" it's trying to get - or is the calling
> task the other, and it's trying to access something it wouldn't
> directly have access to?

For naming clashes, I'd suggest:

 - try to name according to *why* something is done, not necessarily what 
   it does.

   For example, is it really in "another task"? Maybe it's just an 
   on-demand thread of the same task?  Do you actually care how the 
   deferred work is done?

 - avoid being vague. I agree with not liking the name much, and the 
   "other" thing bothers me. Like Hugh, it makes me ask "_What_ other 
   task?"

So I would suggest not concentrating on some implementation issue, but on 
the reason why you need it in the first place. Namely that you want to 
defer the actual action to avoid deadlock due to recursive locking. So 
that "why do I actually do this" thing implies something like 
"sysfs_store_async()" or "sysfs_store_deferred()" or maybe actually 
concentrate on the locking angle and say something like 
"sysfs_store_needs_to_reacquire_lock()".

(That last one wasn't really serious - it's too long and cumbersome, but 
it's an example of not caring _how_ you do it, just abotu what you want 
done).

		Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] sysfs and driver core: add callback helper, used by SCSI and S390
  2007-03-15 16:51                                                               ` Linus Torvalds
@ 2007-03-15 19:50                                                                 ` Alan Stern
  2007-03-15 19:51                                                                 ` [PATCH] sysfs: reinstate exclusion between method calls and attribute unregistration Alan Stern
  1 sibling, 0 replies; 51+ messages in thread
From: Alan Stern @ 2007-03-15 19:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Cornelia Huck, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	SCSI development list, linux-s390, Kernel development list

This patch (as868) adds a helper routine for device drivers that need
to set up a callback to perform some action in a different process's
context.  This is intended for use by attribute methods that want to
unregister themselves or their parent device.  Attribute method calls
are mutually exclusive with unregistration, so such actions cannot be
taken directly.

Two attribute methods are converted to use the new helper routine: one
for SCSI device deletion and one for System/390 ccwgroup devices.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

---

Index: 2.6.21-rc3-git9/include/linux/sysfs.h
===================================================================
--- 2.6.21-rc3-git9.orig/include/linux/sysfs.h
+++ 2.6.21-rc3-git9/include/linux/sysfs.h
@@ -78,6 +78,9 @@ struct sysfs_ops {
 
 #ifdef CONFIG_SYSFS
 
+extern int sysfs_schedule_callback(struct kobject *kobj,
+		void (*func)(void *), void *data);
+
 extern int __must_check
 sysfs_create_dir(struct kobject *, struct dentry *);
 
@@ -132,6 +135,12 @@ extern int __must_check sysfs_init(void)
 
 #else /* CONFIG_SYSFS */
 
+static inline int sysfs_schedule_callback(struct kobject *kobj,
+		void (*func)(void *), void *data)
+{
+	return -ENOSYS;
+}
+
 static inline int sysfs_create_dir(struct kobject * k, struct dentry *shadow)
 {
 	return 0;
Index: 2.6.21-rc3-git9/fs/sysfs/file.c
===================================================================
--- 2.6.21-rc3-git9.orig/fs/sysfs/file.c
+++ 2.6.21-rc3-git9/fs/sysfs/file.c
@@ -629,6 +629,60 @@ void sysfs_remove_file_from_group(struct
 }
 EXPORT_SYMBOL_GPL(sysfs_remove_file_from_group);
 
+struct sysfs_schedule_callback_struct {
+	struct kobject 		*kobj;
+	void			(*func)(void *);
+	void			*data;
+	struct work_struct	work;
+};
+
+static void sysfs_schedule_callback_work(struct work_struct *work)
+{
+	struct sysfs_schedule_callback_struct *ss = container_of(work,
+			struct sysfs_schedule_callback_struct, work);
+
+	(ss->func)(ss->data);
+	kobject_put(ss->kobj);
+	kfree(ss);
+}
+
+/**
+ * sysfs_schedule_callback - helper to schedule a callback for a kobject
+ * @kobj: object we're acting for.
+ * @func: callback function to invoke later.
+ * @data: argument to pass to @func.
+ *
+ * sysfs attribute methods must not unregister themselves or their parent
+ * kobject (which would amount to the same thing).  Attempts to do so will
+ * deadlock, since unregistration is mutually exclusive with driver
+ * callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to call back @func with @data as its
+ * argument in the workqueue's process context.  @kobj will be pinned
+ * until @func returns.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ */
+int sysfs_schedule_callback(struct kobject *kobj, void (*func)(void *),
+		void *data)
+{
+	struct sysfs_schedule_callback_struct *ss;
+
+	ss = kmalloc(sizeof(*ss), GFP_KERNEL);
+	if (!ss)
+		return -ENOMEM;
+	kobject_get(kobj);
+	ss->kobj = kobj;
+	ss->func = func;
+	ss->data = data;
+	INIT_WORK(&ss->work, sysfs_schedule_callback_work);
+	schedule_work(&ss->work);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(sysfs_schedule_callback);
+
 
 EXPORT_SYMBOL_GPL(sysfs_create_file);
 EXPORT_SYMBOL_GPL(sysfs_remove_file);
Index: 2.6.21-rc3-git9/include/linux/device.h
===================================================================
--- 2.6.21-rc3-git9.orig/include/linux/device.h
+++ 2.6.21-rc3-git9/include/linux/device.h
@@ -353,6 +353,8 @@ extern int __must_check device_create_bi
 					       struct bin_attribute *attr);
 extern void device_remove_bin_file(struct device *dev,
 				   struct bin_attribute *attr);
+extern int device_schedule_callback(struct device *dev,
+		void (*func)(struct device *));
 
 /* device resource management */
 typedef void (*dr_release_t)(struct device *dev, void *res);
Index: 2.6.21-rc3-git9/drivers/base/core.c
===================================================================
--- 2.6.21-rc3-git9.orig/drivers/base/core.c
+++ 2.6.21-rc3-git9/drivers/base/core.c
@@ -407,6 +407,35 @@ void device_remove_bin_file(struct devic
 }
 EXPORT_SYMBOL_GPL(device_remove_bin_file);
 
+/**
+ * device_schedule_callback - helper to schedule a callback for a device
+ * @dev: device.
+ * @func: callback function to invoke later.
+ *
+ * Attribute methods must not unregister themselves or their parent device
+ * (which would amount to the same thing).  Attempts to do so will deadlock,
+ * since unregistration is mutually exclusive with driver callbacks.
+ *
+ * Instead methods can call this routine, which will attempt to allocate
+ * and schedule a workqueue request to call back @func with @dev as its
+ * argument in the workqueue's process context.  @dev will be pinned until
+ * @func returns.
+ *
+ * Returns 0 if the request was submitted, -ENOMEM if storage could not
+ * be allocated.
+ *
+ * NOTE: This routine won't work if CONFIG_SYSFS isn't set!  It uses an
+ * underlying sysfs routine (since it is intended for use by attribute
+ * methods), and if sysfs isn't available you'll get nothing but -ENOSYS.
+ */
+int device_schedule_callback(struct device *dev,
+		void (*func)(struct device *))
+{
+	return sysfs_schedule_callback(&dev->kobj,
+			(void (*)(void *)) func, dev);
+}
+EXPORT_SYMBOL_GPL(device_schedule_callback);
+
 static void klist_children_get(struct klist_node *n)
 {
 	struct device *dev = container_of(n, struct device, knode_parent);
Index: 2.6.21-rc3-git9/drivers/scsi/scsi_sysfs.c
===================================================================
--- 2.6.21-rc3-git9.orig/drivers/scsi/scsi_sysfs.c
+++ 2.6.21-rc3-git9/drivers/scsi/scsi_sysfs.c
@@ -452,10 +452,22 @@ store_rescan_field (struct device *dev, 
 }
 static DEVICE_ATTR(rescan, S_IWUSR, NULL, store_rescan_field);
 
+static void sdev_store_delete_callback(struct device *dev)
+{
+	scsi_remove_device(to_scsi_device(dev));
+}
+
 static ssize_t sdev_store_delete(struct device *dev, struct device_attribute *attr, const char *buf,
 				 size_t count)
 {
-	scsi_remove_device(to_scsi_device(dev));
+	int rc;
+
+	/* An attribute cannot be unregistered by one of its own methods,
+	 * so we have to use this roundabout approach.
+	 */
+	rc = device_schedule_callback(dev, sdev_store_delete_callback);
+	if (rc)
+		count = rc;
 	return count;
 };
 static DEVICE_ATTR(delete, S_IWUSR, NULL, sdev_store_delete);
Index: 2.6.21-rc3-git9/drivers/s390/cio/ccwgroup.c
===================================================================
--- 2.6.21-rc3-git9.orig/drivers/s390/cio/ccwgroup.c
+++ 2.6.21-rc3-git9/drivers/s390/cio/ccwgroup.c
@@ -71,19 +71,31 @@ __ccwgroup_remove_symlinks(struct ccwgro
  * Provide an 'ungroup' attribute so the user can remove group devices no
  * longer needed or accidentially created. Saves memory :)
  */
+static void ccwgroup_ungroup_callback(struct device *dev)
+{
+	struct ccwgroup_device *gdev = to_ccwgroupdev(dev);
+
+	__ccwgroup_remove_symlinks(gdev);
+	device_unregister(dev);
+}
+
 static ssize_t
 ccwgroup_ungroup_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count)
 {
 	struct ccwgroup_device *gdev;
+	int rc;
 
 	gdev = to_ccwgroupdev(dev);
 
 	if (gdev->state != CCWGROUP_OFFLINE)
 		return -EINVAL;
 
-	__ccwgroup_remove_symlinks(gdev);
-	device_unregister(dev);
-
+	/* Note that we cannot unregister the device from one of its
+	 * attribute methods, so we have to use this roundabout approach.
+	 */
+	rc = device_schedule_callback(dev, ccwgroup_ungroup_callback);
+	if (rc)
+		count = rc;
 	return count;
 }
 


^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] sysfs: reinstate exclusion between method calls and attribute unregistration
  2007-03-15 16:51                                                               ` Linus Torvalds
  2007-03-15 19:50                                                                 ` [PATCH] sysfs and driver core: add callback helper, used by SCSI and S390 Alan Stern
@ 2007-03-15 19:51                                                                 ` Alan Stern
  1 sibling, 0 replies; 51+ messages in thread
From: Alan Stern @ 2007-03-15 19:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Hugh Dickins, Cornelia Huck, Dmitry Torokhov, Oliver Neukum,
	Maneesh Soni, gregkh, Richard Purdie, James Bottomley,
	Kernel development list

This patch (as869) reinstates the mutual exclusion between sysfs
attribute method calls and attribute unregistration.  The
previously-reported deadlocks have been fixed, and this exclusion is
by far the simplest way to avoid races during driver unbinding.

The check for orphaned read-buffers has been moved down slightly, so
that the remainder of a partially-read buffer will still be available
to userspace even after the attribute has been unregistered.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>

---

Index: 2.6.21-rc3-git9/fs/sysfs/inode.c
===================================================================
--- 2.6.21-rc3-git9.orig/fs/sysfs/inode.c
+++ 2.6.21-rc3-git9/fs/sysfs/inode.c
@@ -222,13 +222,17 @@ const unsigned char * sysfs_get_name(str
 
 static inline void orphan_all_buffers(struct inode *node)
 {
-	struct sysfs_buffer_collection *set = node->i_private;
+	struct sysfs_buffer_collection *set;
 	struct sysfs_buffer *buf;
 
 	mutex_lock_nested(&node->i_mutex, I_MUTEX_CHILD);
-	if (node->i_private) {
-		list_for_each_entry(buf, &set->associates, associates)
+	set = node->i_private;
+	if (set) {
+		list_for_each_entry(buf, &set->associates, associates) {
+			down(&buf->sem);
 			buf->orphaned = 1;
+			up(&buf->sem);
+		}
 	}
 	mutex_unlock(&node->i_mutex);
 }
Index: 2.6.21-rc3-git9/fs/sysfs/file.c
===================================================================
--- 2.6.21-rc3-git9.orig/fs/sysfs/file.c
+++ 2.6.21-rc3-git9/fs/sysfs/file.c
@@ -168,12 +168,12 @@ sysfs_read_file(struct file *file, char 
 	ssize_t retval = 0;
 
 	down(&buffer->sem);
-	if (buffer->orphaned) {
-		retval = -ENODEV;
-		goto out;
-	}
 	if (buffer->needs_read_fill) {
-		if ((retval = fill_read_buffer(file->f_path.dentry,buffer)))
+		if (buffer->orphaned)
+			retval = -ENODEV;
+		else
+			retval = fill_read_buffer(file->f_path.dentry,buffer);
+		if (retval)
 			goto out;
 	}
 	pr_debug("%s: count = %zd, ppos = %lld, buf = %s\n",


^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2007-03-15 19:51 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-08 13:05 refcounting drivers' data structures used in sysfs buffers Oliver Neukum
2007-03-08 16:02 ` Alan Stern
2007-03-09  0:45   ` Oliver Neukum
2007-03-09 16:32     ` Alan Stern
2007-03-09 16:44       ` Oliver Neukum
2007-03-09 17:02         ` Dmitry Torokhov
2007-03-09 17:18           ` Oliver Neukum
2007-03-09 17:34             ` Dmitry Torokhov
2007-03-09 19:32               ` Alan Stern
2007-03-09 20:05                 ` Oliver Neukum
2007-03-09 20:27                   ` Alan Stern
2007-03-09 20:39                     ` Oliver Neukum
2007-03-09 20:08               ` Alan Stern
2007-03-09 20:48                 ` Oliver Neukum
2007-03-10 19:19                   ` Alan Stern
2007-03-12  8:54                     ` Oliver Neukum
2007-03-12 14:57                       ` Alan Stern
2007-03-12 15:23                         ` Oliver Neukum
2007-03-12 15:42                           ` Dmitry Torokhov
2007-03-12 15:59                             ` Oliver Neukum
2007-03-12 16:21                               ` Alan Stern
2007-03-12 18:25                                 ` Oliver Neukum
2007-03-12 19:31                                   ` Alan Stern
2007-03-12 19:49                                     ` Oliver Neukum
2007-03-12 20:03                                       ` Alan Stern
2007-03-12 20:15                                         ` Oliver Neukum
2007-03-12 20:31                                         ` Dmitry Torokhov
2007-03-12 20:45                                           ` Alan Stern
2007-03-12 21:31                                           ` Richard Purdie
2007-03-13 15:00                                             ` 2.6.21-rc suspend regression: sysfs deadlock Alan Stern
2007-03-13 18:42                                               ` Cornelia Huck
2007-03-13 21:20                                                 ` Linus Torvalds
2007-03-14 16:12                                                   ` Alan Stern
2007-03-14 18:43                                                     ` Cornelia Huck
2007-03-14 19:23                                                       ` Alan Stern
2007-03-15 10:27                                                         ` Cornelia Huck
2007-03-15 12:31                                                           ` Hugh Dickins
2007-03-15 13:02                                                             ` Oliver Neukum
2007-03-15 13:22                                                               ` Dmitry Torokhov
2007-03-15 13:59                                                                 ` Hugh Dickins
2007-03-15 14:27                                                           ` Alan Stern
2007-03-15 15:32                                                             ` Cornelia Huck
2007-03-15 16:29                                                             ` Hugh Dickins
2007-03-15 16:51                                                               ` Linus Torvalds
2007-03-15 19:50                                                                 ` [PATCH] sysfs and driver core: add callback helper, used by SCSI and S390 Alan Stern
2007-03-15 19:51                                                                 ` [PATCH] sysfs: reinstate exclusion between method calls and attribute unregistration Alan Stern
2007-03-13 19:00                                               ` 2.6.21-rc suspend regression: sysfs deadlock Hugh Dickins
2007-03-13 20:09                                                 ` Alan Stern
2007-03-13 20:55                                                   ` Hugh Dickins
2007-03-13 21:08                                                     ` Dmitry Torokhov
2007-03-13 21:20                                                     ` Alan Stern

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).