All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] cdev_put() race condition
@ 2008-12-08 20:56 ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-08 20:56 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, v4l, Laurent Pinchart

Hi Greg,

Laurent found a race condition in the uvc driver that we traced to the 
way chrdev_open and cdev_put/get work.

You need the following ingredients to reproduce it:

1) a hot-pluggable char device like an USB webcam.
2) a manually created device node for such a webcam instead of relying 
on udev.

In order to easily force this situation you would also need to add a  
delay to the char device's release() function. For webcams that would 
be at the top of v4l2_chardev_release() in 
drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge 
would have the same effect.

The sequence of events in the case of a webcam is as follows:

1) The USB device is removed, causing a disconnect.

2) The webcam driver unregisters the video device which in turn calls 
cdev_del().

3) When the last application using the device is closed, the cdev is 
released when the kref of the cdev's kobject goes to 0.

4) If the kref's release() call takes a while due to e.g. extra cleanup 
in the case of a webcam, then another application can try to open the 
video device. Note that this requires a device node created with mknod, 
otherwise the device nodes would already have been removed by udev.

5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this device 
node was never accessed before), then all is fine since kobj_lookup 
will fail because cdev_del() has been called earlier. However, if this 
device node was used earlier, then the else part is called: 
cdev_get(p). This 'p' is the cdev that is being released. Since the 
kref count is 0 you will get a WARN message from kref_get, but the code 
continues on, the f_op->open will (hopefully) return more-or-less 
gracefully with an error and the cdev_put at the end will cause the 
refcount to go to 0 again, which results in a SECOND call to the kref's 
release function!

See this link for the original discussion on the v4l list containing 
stack traces an a patch that you need if you want to (and can) test 
this with the uvc driver:

http://www.spinics.net/lists/vfl/msg39967.html

Reading Documentation/kref.txt leads me to the conclusion that a mutex 
should be introduced to prevent cdev_get from being called while 
cdev_put is in progress. It should be a mutex instead of a spinlock 
because the kref's release() can do all sorts of cleanups, some of 
which might involve waits.

I think that replacing cdev_lock with a mutex, adding it to cdev_put(), 
cdev_del() and removing it from cdev_purge() should do the trick (I 
hope).

BTW: shouldn't cdev_del() call cdev_put() instead of kobject_put()? If I 
understand it correctly then cdev_add calls cdev_get (through 
exact_lock()), so cdev_del should do the reverse, right?

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [BUG] cdev_put() race condition
@ 2008-12-08 20:56 ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-08 20:56 UTC (permalink / raw)
  To: Greg KH; +Cc: v4l, linux-kernel

Hi Greg,

Laurent found a race condition in the uvc driver that we traced to the 
way chrdev_open and cdev_put/get work.

You need the following ingredients to reproduce it:

1) a hot-pluggable char device like an USB webcam.
2) a manually created device node for such a webcam instead of relying 
on udev.

In order to easily force this situation you would also need to add a  
delay to the char device's release() function. For webcams that would 
be at the top of v4l2_chardev_release() in 
drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge 
would have the same effect.

The sequence of events in the case of a webcam is as follows:

1) The USB device is removed, causing a disconnect.

2) The webcam driver unregisters the video device which in turn calls 
cdev_del().

3) When the last application using the device is closed, the cdev is 
released when the kref of the cdev's kobject goes to 0.

4) If the kref's release() call takes a while due to e.g. extra cleanup 
in the case of a webcam, then another application can try to open the 
video device. Note that this requires a device node created with mknod, 
otherwise the device nodes would already have been removed by udev.

5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this device 
node was never accessed before), then all is fine since kobj_lookup 
will fail because cdev_del() has been called earlier. However, if this 
device node was used earlier, then the else part is called: 
cdev_get(p). This 'p' is the cdev that is being released. Since the 
kref count is 0 you will get a WARN message from kref_get, but the code 
continues on, the f_op->open will (hopefully) return more-or-less 
gracefully with an error and the cdev_put at the end will cause the 
refcount to go to 0 again, which results in a SECOND call to the kref's 
release function!

See this link for the original discussion on the v4l list containing 
stack traces an a patch that you need if you want to (and can) test 
this with the uvc driver:

http://www.spinics.net/lists/vfl/msg39967.html

Reading Documentation/kref.txt leads me to the conclusion that a mutex 
should be introduced to prevent cdev_get from being called while 
cdev_put is in progress. It should be a mutex instead of a spinlock 
because the kref's release() can do all sorts of cleanups, some of 
which might involve waits.

I think that replacing cdev_lock with a mutex, adding it to cdev_put(), 
cdev_del() and removing it from cdev_purge() should do the trick (I 
hope).

BTW: shouldn't cdev_del() call cdev_put() instead of kobject_put()? If I 
understand it correctly then cdev_add calls cdev_get (through 
exact_lock()), so cdev_del should do the reverse, right?

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-08 20:56 ` Hans Verkuil
@ 2008-12-16 10:06   ` Hans Verkuil
  -1 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-16 10:06 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, v4l, Laurent Pinchart

On Monday 08 December 2008 21:56:26 Hans Verkuil wrote:
> Hi Greg,
>
> Laurent found a race condition in the uvc driver that we traced to
> the way chrdev_open and cdev_put/get work.
>
> You need the following ingredients to reproduce it:
>
> 1) a hot-pluggable char device like an USB webcam.
> 2) a manually created device node for such a webcam instead of
> relying on udev.
>
> In order to easily force this situation you would also need to add a
> delay to the char device's release() function. For webcams that would
> be at the top of v4l2_chardev_release() in
> drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge
> would have the same effect.
>
> The sequence of events in the case of a webcam is as follows:
>
> 1) The USB device is removed, causing a disconnect.
>
> 2) The webcam driver unregisters the video device which in turn calls
> cdev_del().
>
> 3) When the last application using the device is closed, the cdev is
> released when the kref of the cdev's kobject goes to 0.
>
> 4) If the kref's release() call takes a while due to e.g. extra
> cleanup in the case of a webcam, then another application can try to
> open the video device. Note that this requires a device node created
> with mknod, otherwise the device nodes would already have been
> removed by udev.
>
> 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> device node was never accessed before), then all is fine since
> kobj_lookup will fail because cdev_del() has been called earlier.
> However, if this device node was used earlier, then the else part is
> called:
> cdev_get(p). This 'p' is the cdev that is being released. Since the
> kref count is 0 you will get a WARN message from kref_get, but the
> code continues on, the f_op->open will (hopefully) return
> more-or-less gracefully with an error and the cdev_put at the end
> will cause the refcount to go to 0 again, which results in a SECOND
> call to the kref's release function!
>
> See this link for the original discussion on the v4l list containing
> stack traces an a patch that you need if you want to (and can) test
> this with the uvc driver:
>
> http://www.spinics.net/lists/vfl/msg39967.html
>
> Reading Documentation/kref.txt leads me to the conclusion that a
> mutex should be introduced to prevent cdev_get from being called
> while cdev_put is in progress. It should be a mutex instead of a
> spinlock because the kref's release() can do all sorts of cleanups,
> some of which might involve waits.
>
> I think that replacing cdev_lock with a mutex, adding it to
> cdev_put(), cdev_del() and removing it from cdev_purge() should do
> the trick (I hope).
>
> BTW: shouldn't cdev_del() call cdev_put() instead of kobject_put()?
> If I understand it correctly then cdev_add calls cdev_get (through
> exact_lock()), so cdev_del should do the reverse, right?
>
> Regards,
>
> 	Hans

Hi Greg,

Will you have time to look at this? I can try to make a patch for this, 
but I'd like to get your feedback first.

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
@ 2008-12-16 10:06   ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-16 10:06 UTC (permalink / raw)
  To: Greg KH; +Cc: v4l, linux-kernel

On Monday 08 December 2008 21:56:26 Hans Verkuil wrote:
> Hi Greg,
>
> Laurent found a race condition in the uvc driver that we traced to
> the way chrdev_open and cdev_put/get work.
>
> You need the following ingredients to reproduce it:
>
> 1) a hot-pluggable char device like an USB webcam.
> 2) a manually created device node for such a webcam instead of
> relying on udev.
>
> In order to easily force this situation you would also need to add a
> delay to the char device's release() function. For webcams that would
> be at the top of v4l2_chardev_release() in
> drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge
> would have the same effect.
>
> The sequence of events in the case of a webcam is as follows:
>
> 1) The USB device is removed, causing a disconnect.
>
> 2) The webcam driver unregisters the video device which in turn calls
> cdev_del().
>
> 3) When the last application using the device is closed, the cdev is
> released when the kref of the cdev's kobject goes to 0.
>
> 4) If the kref's release() call takes a while due to e.g. extra
> cleanup in the case of a webcam, then another application can try to
> open the video device. Note that this requires a device node created
> with mknod, otherwise the device nodes would already have been
> removed by udev.
>
> 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> device node was never accessed before), then all is fine since
> kobj_lookup will fail because cdev_del() has been called earlier.
> However, if this device node was used earlier, then the else part is
> called:
> cdev_get(p). This 'p' is the cdev that is being released. Since the
> kref count is 0 you will get a WARN message from kref_get, but the
> code continues on, the f_op->open will (hopefully) return
> more-or-less gracefully with an error and the cdev_put at the end
> will cause the refcount to go to 0 again, which results in a SECOND
> call to the kref's release function!
>
> See this link for the original discussion on the v4l list containing
> stack traces an a patch that you need if you want to (and can) test
> this with the uvc driver:
>
> http://www.spinics.net/lists/vfl/msg39967.html
>
> Reading Documentation/kref.txt leads me to the conclusion that a
> mutex should be introduced to prevent cdev_get from being called
> while cdev_put is in progress. It should be a mutex instead of a
> spinlock because the kref's release() can do all sorts of cleanups,
> some of which might involve waits.
>
> I think that replacing cdev_lock with a mutex, adding it to
> cdev_put(), cdev_del() and removing it from cdev_purge() should do
> the trick (I hope).
>
> BTW: shouldn't cdev_del() call cdev_put() instead of kobject_put()?
> If I understand it correctly then cdev_add calls cdev_get (through
> exact_lock()), so cdev_del should do the reverse, right?
>
> Regards,
>
> 	Hans

Hi Greg,

Will you have time to look at this? I can try to make a patch for this, 
but I'd like to get your feedback first.

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-08 20:56 ` Hans Verkuil
  (?)
  (?)
@ 2008-12-16 20:22 ` Greg KH
  2008-12-16 21:00     ` Hans Verkuil
  -1 siblings, 1 reply; 35+ messages in thread
From: Greg KH @ 2008-12-16 20:22 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-kernel, v4l, Laurent Pinchart

On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> Hi Greg,
> 
> Laurent found a race condition in the uvc driver that we traced to the 
> way chrdev_open and cdev_put/get work.
> 
> You need the following ingredients to reproduce it:
> 
> 1) a hot-pluggable char device like an USB webcam.
> 2) a manually created device node for such a webcam instead of relying 
> on udev.
> 
> In order to easily force this situation you would also need to add a  
> delay to the char device's release() function. For webcams that would 
> be at the top of v4l2_chardev_release() in 
> drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge 
> would have the same effect.
> 
> The sequence of events in the case of a webcam is as follows:
> 
> 1) The USB device is removed, causing a disconnect.
> 
> 2) The webcam driver unregisters the video device which in turn calls 
> cdev_del().
> 
> 3) When the last application using the device is closed, the cdev is 
> released when the kref of the cdev's kobject goes to 0.
> 
> 4) If the kref's release() call takes a while due to e.g. extra cleanup 
> in the case of a webcam, then another application can try to open the 
> video device. Note that this requires a device node created with mknod, 
> otherwise the device nodes would already have been removed by udev.
> 
> 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this device 
> node was never accessed before), then all is fine since kobj_lookup 
> will fail because cdev_del() has been called earlier. However, if this 
> device node was used earlier, then the else part is called: 
> cdev_get(p). This 'p' is the cdev that is being released. Since the 
> kref count is 0 you will get a WARN message from kref_get, but the code 
> continues on, the f_op->open will (hopefully) return more-or-less 
> gracefully with an error and the cdev_put at the end will cause the 
> refcount to go to 0 again, which results in a SECOND call to the kref's 
> release function!
> 
> See this link for the original discussion on the v4l list containing 
> stack traces an a patch that you need if you want to (and can) test 
> this with the uvc driver:
> 
> http://www.spinics.net/lists/vfl/msg39967.html

The second sentence in that message shows your problem here:
	To avoid the need of a reference count in every v4l2 driver,
	v4l2 moved to cdev which includes its own reference counting
	infrastructure based on kobject.

cdev is not ment to handle the reference counting of any object outside
of itself, and should never be embedded within anything.  I've been
thinking of taking the real "kobject" out of that structure for a long
time now, incase someone did something foolish like this.

Seems I was too late :(

So, to solve this, just remove the reliance on struct cdev in your own
structures, you don't want to do this for the very reason you have now
found (and for others, like the fact that this isn't a "real" struct
kobject in play here, just a fake one.)

Ick, what a mess.

good luck,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-16 20:22 ` Greg KH
@ 2008-12-16 21:00     ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-16 21:00 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, v4l, Laurent Pinchart

On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> > Hi Greg,
> > 
> > Laurent found a race condition in the uvc driver that we traced to the 
> > way chrdev_open and cdev_put/get work.
> > 
> > You need the following ingredients to reproduce it:
> > 
> > 1) a hot-pluggable char device like an USB webcam.
> > 2) a manually created device node for such a webcam instead of relying 
> > on udev.
> > 
> > In order to easily force this situation you would also need to add a  
> > delay to the char device's release() function. For webcams that would 
> > be at the top of v4l2_chardev_release() in 
> > drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge 
> > would have the same effect.
> > 
> > The sequence of events in the case of a webcam is as follows:
> > 
> > 1) The USB device is removed, causing a disconnect.
> > 
> > 2) The webcam driver unregisters the video device which in turn calls 
> > cdev_del().
> > 
> > 3) When the last application using the device is closed, the cdev is 
> > released when the kref of the cdev's kobject goes to 0.
> > 
> > 4) If the kref's release() call takes a while due to e.g. extra cleanup 
> > in the case of a webcam, then another application can try to open the 
> > video device. Note that this requires a device node created with mknod, 
> > otherwise the device nodes would already have been removed by udev.
> > 
> > 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this device 
> > node was never accessed before), then all is fine since kobj_lookup 
> > will fail because cdev_del() has been called earlier. However, if this 
> > device node was used earlier, then the else part is called: 
> > cdev_get(p). This 'p' is the cdev that is being released. Since the 
> > kref count is 0 you will get a WARN message from kref_get, but the code 
> > continues on, the f_op->open will (hopefully) return more-or-less 
> > gracefully with an error and the cdev_put at the end will cause the 
> > refcount to go to 0 again, which results in a SECOND call to the kref's 
> > release function!
> > 
> > See this link for the original discussion on the v4l list containing 
> > stack traces an a patch that you need if you want to (and can) test 
> > this with the uvc driver:
> > 
> > http://www.spinics.net/lists/vfl/msg39967.html
> 
> The second sentence in that message shows your problem here:
> 	To avoid the need of a reference count in every v4l2 driver,
> 	v4l2 moved to cdev which includes its own reference counting
> 	infrastructure based on kobject.
> 
> cdev is not ment to handle the reference counting of any object outside
> of itself, and should never be embedded within anything.  I've been
> thinking of taking the real "kobject" out of that structure for a long
> time now, incase someone did something foolish like this.
> 
> Seems I was too late :(
> 
> So, to solve this, just remove the reliance on struct cdev in your own
> structures, you don't want to do this for the very reason you have now
> found (and for others, like the fact that this isn't a "real" struct
> kobject in play here, just a fake one.)
> 
> Ick, what a mess.

Sorry, but this makes no sense. First of all the race condition exists regardless of how v4l uses it. Other drivers using cdev with a hot-pluggable device in combination with a manually created device node should show the same problem. It's just that we found it with v4l because the release callback takes longer than usual, thus increasing the chances of hitting the race.

The core problem is simply that it is possible to call cdev_get while in cdev_put! That should never happen.

Secondly, why shouldn't struct cdev be embedded in anything? It's used in lots of drivers that way. I really don't see what's unusual or messy about v4l in that respect.

Regards,

	Hans


-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
@ 2008-12-16 21:00     ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-16 21:00 UTC (permalink / raw)
  To: Greg KH; +Cc: v4l, linux-kernel

On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> > Hi Greg,
> > 
> > Laurent found a race condition in the uvc driver that we traced to the 
> > way chrdev_open and cdev_put/get work.
> > 
> > You need the following ingredients to reproduce it:
> > 
> > 1) a hot-pluggable char device like an USB webcam.
> > 2) a manually created device node for such a webcam instead of relying 
> > on udev.
> > 
> > In order to easily force this situation you would also need to add a  
> > delay to the char device's release() function. For webcams that would 
> > be at the top of v4l2_chardev_release() in 
> > drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge 
> > would have the same effect.
> > 
> > The sequence of events in the case of a webcam is as follows:
> > 
> > 1) The USB device is removed, causing a disconnect.
> > 
> > 2) The webcam driver unregisters the video device which in turn calls 
> > cdev_del().
> > 
> > 3) When the last application using the device is closed, the cdev is 
> > released when the kref of the cdev's kobject goes to 0.
> > 
> > 4) If the kref's release() call takes a while due to e.g. extra cleanup 
> > in the case of a webcam, then another application can try to open the 
> > video device. Note that this requires a device node created with mknod, 
> > otherwise the device nodes would already have been removed by udev.
> > 
> > 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this device 
> > node was never accessed before), then all is fine since kobj_lookup 
> > will fail because cdev_del() has been called earlier. However, if this 
> > device node was used earlier, then the else part is called: 
> > cdev_get(p). This 'p' is the cdev that is being released. Since the 
> > kref count is 0 you will get a WARN message from kref_get, but the code 
> > continues on, the f_op->open will (hopefully) return more-or-less 
> > gracefully with an error and the cdev_put at the end will cause the 
> > refcount to go to 0 again, which results in a SECOND call to the kref's 
> > release function!
> > 
> > See this link for the original discussion on the v4l list containing 
> > stack traces an a patch that you need if you want to (and can) test 
> > this with the uvc driver:
> > 
> > http://www.spinics.net/lists/vfl/msg39967.html
> 
> The second sentence in that message shows your problem here:
> 	To avoid the need of a reference count in every v4l2 driver,
> 	v4l2 moved to cdev which includes its own reference counting
> 	infrastructure based on kobject.
> 
> cdev is not ment to handle the reference counting of any object outside
> of itself, and should never be embedded within anything.  I've been
> thinking of taking the real "kobject" out of that structure for a long
> time now, incase someone did something foolish like this.
> 
> Seems I was too late :(
> 
> So, to solve this, just remove the reliance on struct cdev in your own
> structures, you don't want to do this for the very reason you have now
> found (and for others, like the fact that this isn't a "real" struct
> kobject in play here, just a fake one.)
> 
> Ick, what a mess.

Sorry, but this makes no sense. First of all the race condition exists regardless of how v4l uses it. Other drivers using cdev with a hot-pluggable device in combination with a manually created device node should show the same problem. It's just that we found it with v4l because the release callback takes longer than usual, thus increasing the chances of hitting the race.

The core problem is simply that it is possible to call cdev_get while in cdev_put! That should never happen.

Secondly, why shouldn't struct cdev be embedded in anything? It's used in lots of drivers that way. I really don't see what's unusual or messy about v4l in that respect.

Regards,

	Hans


-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-16 21:00     ` Hans Verkuil
  (?)
@ 2008-12-16 21:21     ` Greg KH
  2008-12-16 23:23         ` Hans Verkuil
  -1 siblings, 1 reply; 35+ messages in thread
From: Greg KH @ 2008-12-16 21:21 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-kernel, v4l, Laurent Pinchart

On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> > On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> > > Hi Greg,
> > > 
> > > Laurent found a race condition in the uvc driver that we traced to the 
> > > way chrdev_open and cdev_put/get work.
> > > 
> > > You need the following ingredients to reproduce it:
> > > 
> > > 1) a hot-pluggable char device like an USB webcam.
> > > 2) a manually created device node for such a webcam instead of relying 
> > > on udev.
> > > 
> > > In order to easily force this situation you would also need to add a  
> > > delay to the char device's release() function. For webcams that would 
> > > be at the top of v4l2_chardev_release() in 
> > > drivers/media/video/v4l2-dev.c. But adding a delay to e.g. cdev_purge 
> > > would have the same effect.
> > > 
> > > The sequence of events in the case of a webcam is as follows:
> > > 
> > > 1) The USB device is removed, causing a disconnect.
> > > 
> > > 2) The webcam driver unregisters the video device which in turn calls 
> > > cdev_del().
> > > 
> > > 3) When the last application using the device is closed, the cdev is 
> > > released when the kref of the cdev's kobject goes to 0.
> > > 
> > > 4) If the kref's release() call takes a while due to e.g. extra cleanup 
> > > in the case of a webcam, then another application can try to open the 
> > > video device. Note that this requires a device node created with mknod, 
> > > otherwise the device nodes would already have been removed by udev.
> > > 
> > > 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this device 
> > > node was never accessed before), then all is fine since kobj_lookup 
> > > will fail because cdev_del() has been called earlier. However, if this 
> > > device node was used earlier, then the else part is called: 
> > > cdev_get(p). This 'p' is the cdev that is being released. Since the 
> > > kref count is 0 you will get a WARN message from kref_get, but the code 
> > > continues on, the f_op->open will (hopefully) return more-or-less 
> > > gracefully with an error and the cdev_put at the end will cause the 
> > > refcount to go to 0 again, which results in a SECOND call to the kref's 
> > > release function!
> > > 
> > > See this link for the original discussion on the v4l list containing 
> > > stack traces an a patch that you need if you want to (and can) test 
> > > this with the uvc driver:
> > > 
> > > http://www.spinics.net/lists/vfl/msg39967.html
> > 
> > The second sentence in that message shows your problem here:
> > 	To avoid the need of a reference count in every v4l2 driver,
> > 	v4l2 moved to cdev which includes its own reference counting
> > 	infrastructure based on kobject.
> > 
> > cdev is not ment to handle the reference counting of any object outside
> > of itself, and should never be embedded within anything.  I've been
> > thinking of taking the real "kobject" out of that structure for a long
> > time now, incase someone did something foolish like this.
> > 
> > Seems I was too late :(
> > 
> > So, to solve this, just remove the reliance on struct cdev in your own
> > structures, you don't want to do this for the very reason you have now
> > found (and for others, like the fact that this isn't a "real" struct
> > kobject in play here, just a fake one.)
> > 
> > Ick, what a mess.
> 
> Sorry, but this makes no sense. First of all the race condition exists
> regardless of how v4l uses it. Other drivers using cdev with a
> hot-pluggable device in combination with a manually created device
> node should show the same problem.

I don't see how this would be, unless this problem has always existed in
the cdev code, it was created back before dynamic device nodes with udev
existed :)

> It's just that we found it with v4l because the release callback takes
> longer than usual, thus increasing the chances of hitting the race.

The release callback for the cdev itself?

> The core problem is simply that it is possible to call cdev_get while
> in cdev_put! That should never happen.

True, and cdev_put is only called from __fput() which has the proper
locking to handle the call to cdev_get() if a char device is opened,
right?

> Secondly, why shouldn't struct cdev be embedded in anything? It's used
> in lots of drivers that way. I really don't see what's unusual or
> messy about v4l in that respect.

I don't see it embedded in any other structures at the moment, and used
to reference count the structure, do you have pointers to where it is?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-16 21:21     ` Greg KH
@ 2008-12-16 23:23         ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-16 23:23 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, v4l, Laurent Pinchart

On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
> On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> > On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> > > On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> > > > Hi Greg,
> > > >
> > > > Laurent found a race condition in the uvc driver that we traced to
> > > > the way chrdev_open and cdev_put/get work.
> > > >
> > > > You need the following ingredients to reproduce it:
> > > >
> > > > 1) a hot-pluggable char device like an USB webcam.
> > > > 2) a manually created device node for such a webcam instead of
> > > > relying on udev.
> > > >
> > > > In order to easily force this situation you would also need to add
> > > > a delay to the char device's release() function. For webcams that
> > > > would be at the top of v4l2_chardev_release() in
> > > > drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
> > > > cdev_purge would have the same effect.
> > > >
> > > > The sequence of events in the case of a webcam is as follows:
> > > >
> > > > 1) The USB device is removed, causing a disconnect.
> > > >
> > > > 2) The webcam driver unregisters the video device which in turn
> > > > calls cdev_del().
> > > >
> > > > 3) When the last application using the device is closed, the cdev
> > > > is released when the kref of the cdev's kobject goes to 0.
> > > >
> > > > 4) If the kref's release() call takes a while due to e.g. extra
> > > > cleanup in the case of a webcam, then another application can try
> > > > to open the video device. Note that this requires a device node
> > > > created with mknod, otherwise the device nodes would already have
> > > > been removed by udev.
> > > >
> > > > 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> > > > device node was never accessed before), then all is fine since
> > > > kobj_lookup will fail because cdev_del() has been called earlier.
> > > > However, if this device node was used earlier, then the else part
> > > > is called: cdev_get(p). This 'p' is the cdev that is being
> > > > released. Since the kref count is 0 you will get a WARN message
> > > > from kref_get, but the code continues on, the f_op->open will
> > > > (hopefully) return more-or-less gracefully with an error and the
> > > > cdev_put at the end will cause the refcount to go to 0 again, which
> > > > results in a SECOND call to the kref's release function!
> > > >
> > > > See this link for the original discussion on the v4l list
> > > > containing stack traces an a patch that you need if you want to
> > > > (and can) test this with the uvc driver:
> > > >
> > > > http://www.spinics.net/lists/vfl/msg39967.html
> > >
> > > The second sentence in that message shows your problem here:
> > > 	To avoid the need of a reference count in every v4l2 driver,
> > > 	v4l2 moved to cdev which includes its own reference counting
> > > 	infrastructure based on kobject.
> > >
> > > cdev is not ment to handle the reference counting of any object
> > > outside of itself, and should never be embedded within anything. 
> > > I've been thinking of taking the real "kobject" out of that structure
> > > for a long time now, incase someone did something foolish like this.
> > >
> > > Seems I was too late :(
> > >
> > > So, to solve this, just remove the reliance on struct cdev in your
> > > own structures, you don't want to do this for the very reason you
> > > have now found (and for others, like the fact that this isn't a
> > > "real" struct kobject in play here, just a fake one.)
> > >
> > > Ick, what a mess.
> >
> > Sorry, but this makes no sense. First of all the race condition exists
> > regardless of how v4l uses it. Other drivers using cdev with a
> > hot-pluggable device in combination with a manually created device
> > node should show the same problem.
>
> I don't see how this would be, unless this problem has always existed in
> the cdev code, it was created back before dynamic device nodes with udev
> existed :)
>
> > It's just that we found it with v4l because the release callback takes
> > longer than usual, thus increasing the chances of hitting the race.
>
> The release callback for the cdev itself?

Yes, although we override the release kobj_type. I noticed that a patch to 
do this through a cdev function was recently posted (and possibly already 
merged).

> > The core problem is simply that it is possible to call cdev_get while
> > in cdev_put! That should never happen.
>
> True, and cdev_put is only called from __fput() which has the proper
> locking to handle the call to cdev_get() if a char device is opened,
> right?

cdev_put is also called through cdev_del, and that's where the problem 
resides. The cdev_del() call has no locking to prevent a call from 
cdev_get(), and it can be called asynchronously as well in response to a 
USB disconnect.

> > Secondly, why shouldn't struct cdev be embedded in anything? It's used
> > in lots of drivers that way. I really don't see what's unusual or
> > messy about v4l in that respect.
>
> I don't see it embedded in any other structures at the moment, and used
> to reference count the structure, do you have pointers to where it is?

Hmm, I took a closer look and it looks like v4l is indeed the first to use 
cdev for refcounting. Others either don't need it or do their own 
refcounting. It's definitely embedded in several structs, though. (grep for 
cdev_init)

However, I don't see how any amount of refcounting in drivers can prevent 
this race. At some point cdev_del() is called and (if nobody is using the 
chardev) cdev's kref goes to 0 triggering the release, and at that moment 
chrdev_open() can be called and the driver has no chance to prevent that, 
only cdev can do that.

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
@ 2008-12-16 23:23         ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-16 23:23 UTC (permalink / raw)
  To: Greg KH; +Cc: v4l, linux-kernel

On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
> On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> > On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> > > On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> > > > Hi Greg,
> > > >
> > > > Laurent found a race condition in the uvc driver that we traced to
> > > > the way chrdev_open and cdev_put/get work.
> > > >
> > > > You need the following ingredients to reproduce it:
> > > >
> > > > 1) a hot-pluggable char device like an USB webcam.
> > > > 2) a manually created device node for such a webcam instead of
> > > > relying on udev.
> > > >
> > > > In order to easily force this situation you would also need to add
> > > > a delay to the char device's release() function. For webcams that
> > > > would be at the top of v4l2_chardev_release() in
> > > > drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
> > > > cdev_purge would have the same effect.
> > > >
> > > > The sequence of events in the case of a webcam is as follows:
> > > >
> > > > 1) The USB device is removed, causing a disconnect.
> > > >
> > > > 2) The webcam driver unregisters the video device which in turn
> > > > calls cdev_del().
> > > >
> > > > 3) When the last application using the device is closed, the cdev
> > > > is released when the kref of the cdev's kobject goes to 0.
> > > >
> > > > 4) If the kref's release() call takes a while due to e.g. extra
> > > > cleanup in the case of a webcam, then another application can try
> > > > to open the video device. Note that this requires a device node
> > > > created with mknod, otherwise the device nodes would already have
> > > > been removed by udev.
> > > >
> > > > 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> > > > device node was never accessed before), then all is fine since
> > > > kobj_lookup will fail because cdev_del() has been called earlier.
> > > > However, if this device node was used earlier, then the else part
> > > > is called: cdev_get(p). This 'p' is the cdev that is being
> > > > released. Since the kref count is 0 you will get a WARN message
> > > > from kref_get, but the code continues on, the f_op->open will
> > > > (hopefully) return more-or-less gracefully with an error and the
> > > > cdev_put at the end will cause the refcount to go to 0 again, which
> > > > results in a SECOND call to the kref's release function!
> > > >
> > > > See this link for the original discussion on the v4l list
> > > > containing stack traces an a patch that you need if you want to
> > > > (and can) test this with the uvc driver:
> > > >
> > > > http://www.spinics.net/lists/vfl/msg39967.html
> > >
> > > The second sentence in that message shows your problem here:
> > > 	To avoid the need of a reference count in every v4l2 driver,
> > > 	v4l2 moved to cdev which includes its own reference counting
> > > 	infrastructure based on kobject.
> > >
> > > cdev is not ment to handle the reference counting of any object
> > > outside of itself, and should never be embedded within anything. 
> > > I've been thinking of taking the real "kobject" out of that structure
> > > for a long time now, incase someone did something foolish like this.
> > >
> > > Seems I was too late :(
> > >
> > > So, to solve this, just remove the reliance on struct cdev in your
> > > own structures, you don't want to do this for the very reason you
> > > have now found (and for others, like the fact that this isn't a
> > > "real" struct kobject in play here, just a fake one.)
> > >
> > > Ick, what a mess.
> >
> > Sorry, but this makes no sense. First of all the race condition exists
> > regardless of how v4l uses it. Other drivers using cdev with a
> > hot-pluggable device in combination with a manually created device
> > node should show the same problem.
>
> I don't see how this would be, unless this problem has always existed in
> the cdev code, it was created back before dynamic device nodes with udev
> existed :)
>
> > It's just that we found it with v4l because the release callback takes
> > longer than usual, thus increasing the chances of hitting the race.
>
> The release callback for the cdev itself?

Yes, although we override the release kobj_type. I noticed that a patch to 
do this through a cdev function was recently posted (and possibly already 
merged).

> > The core problem is simply that it is possible to call cdev_get while
> > in cdev_put! That should never happen.
>
> True, and cdev_put is only called from __fput() which has the proper
> locking to handle the call to cdev_get() if a char device is opened,
> right?

cdev_put is also called through cdev_del, and that's where the problem 
resides. The cdev_del() call has no locking to prevent a call from 
cdev_get(), and it can be called asynchronously as well in response to a 
USB disconnect.

> > Secondly, why shouldn't struct cdev be embedded in anything? It's used
> > in lots of drivers that way. I really don't see what's unusual or
> > messy about v4l in that respect.
>
> I don't see it embedded in any other structures at the moment, and used
> to reference count the structure, do you have pointers to where it is?

Hmm, I took a closer look and it looks like v4l is indeed the first to use 
cdev for refcounting. Others either don't need it or do their own 
refcounting. It's definitely embedded in several structs, though. (grep for 
cdev_init)

However, I don't see how any amount of refcounting in drivers can prevent 
this race. At some point cdev_del() is called and (if nobody is using the 
chardev) cdev's kref goes to 0 triggering the release, and at that moment 
chrdev_open() can be called and the driver has no chance to prevent that, 
only cdev can do that.

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-16 23:23         ` Hans Verkuil
  (?)
@ 2008-12-16 23:30         ` Greg KH
  2008-12-17 13:37             ` Hans Verkuil
  -1 siblings, 1 reply; 35+ messages in thread
From: Greg KH @ 2008-12-16 23:30 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-kernel, v4l, Laurent Pinchart

On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
> On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
> > On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> > > On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> > > > On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> > > > > Hi Greg,
> > > > >
> > > > > Laurent found a race condition in the uvc driver that we traced to
> > > > > the way chrdev_open and cdev_put/get work.
> > > > >
> > > > > You need the following ingredients to reproduce it:
> > > > >
> > > > > 1) a hot-pluggable char device like an USB webcam.
> > > > > 2) a manually created device node for such a webcam instead of
> > > > > relying on udev.
> > > > >
> > > > > In order to easily force this situation you would also need to add
> > > > > a delay to the char device's release() function. For webcams that
> > > > > would be at the top of v4l2_chardev_release() in
> > > > > drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
> > > > > cdev_purge would have the same effect.
> > > > >
> > > > > The sequence of events in the case of a webcam is as follows:
> > > > >
> > > > > 1) The USB device is removed, causing a disconnect.
> > > > >
> > > > > 2) The webcam driver unregisters the video device which in turn
> > > > > calls cdev_del().
> > > > >
> > > > > 3) When the last application using the device is closed, the cdev
> > > > > is released when the kref of the cdev's kobject goes to 0.
> > > > >
> > > > > 4) If the kref's release() call takes a while due to e.g. extra
> > > > > cleanup in the case of a webcam, then another application can try
> > > > > to open the video device. Note that this requires a device node
> > > > > created with mknod, otherwise the device nodes would already have
> > > > > been removed by udev.
> > > > >
> > > > > 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> > > > > device node was never accessed before), then all is fine since
> > > > > kobj_lookup will fail because cdev_del() has been called earlier.
> > > > > However, if this device node was used earlier, then the else part
> > > > > is called: cdev_get(p). This 'p' is the cdev that is being
> > > > > released. Since the kref count is 0 you will get a WARN message
> > > > > from kref_get, but the code continues on, the f_op->open will
> > > > > (hopefully) return more-or-less gracefully with an error and the
> > > > > cdev_put at the end will cause the refcount to go to 0 again, which
> > > > > results in a SECOND call to the kref's release function!
> > > > >
> > > > > See this link for the original discussion on the v4l list
> > > > > containing stack traces an a patch that you need if you want to
> > > > > (and can) test this with the uvc driver:
> > > > >
> > > > > http://www.spinics.net/lists/vfl/msg39967.html
> > > >
> > > > The second sentence in that message shows your problem here:
> > > > 	To avoid the need of a reference count in every v4l2 driver,
> > > > 	v4l2 moved to cdev which includes its own reference counting
> > > > 	infrastructure based on kobject.
> > > >
> > > > cdev is not ment to handle the reference counting of any object
> > > > outside of itself, and should never be embedded within anything. 
> > > > I've been thinking of taking the real "kobject" out of that structure
> > > > for a long time now, incase someone did something foolish like this.
> > > >
> > > > Seems I was too late :(
> > > >
> > > > So, to solve this, just remove the reliance on struct cdev in your
> > > > own structures, you don't want to do this for the very reason you
> > > > have now found (and for others, like the fact that this isn't a
> > > > "real" struct kobject in play here, just a fake one.)
> > > >
> > > > Ick, what a mess.
> > >
> > > Sorry, but this makes no sense. First of all the race condition exists
> > > regardless of how v4l uses it. Other drivers using cdev with a
> > > hot-pluggable device in combination with a manually created device
> > > node should show the same problem.
> >
> > I don't see how this would be, unless this problem has always existed in
> > the cdev code, it was created back before dynamic device nodes with udev
> > existed :)
> >
> > > It's just that we found it with v4l because the release callback takes
> > > longer than usual, thus increasing the chances of hitting the race.
> >
> > The release callback for the cdev itself?
> 
> Yes, although we override the release kobj_type.

Hahaha, yeah, I'm really afraid to say it, but you deserve the oops if
you do that :)

> I noticed that a patch to do this through a cdev function was recently
> posted (and possibly already merged).

No, it was rejected as it was not needed.

> > > The core problem is simply that it is possible to call cdev_get while
> > > in cdev_put! That should never happen.
> >
> > True, and cdev_put is only called from __fput() which has the proper
> > locking to handle the call to cdev_get() if a char device is opened,
> > right?
> 
> cdev_put is also called through cdev_del, and that's where the problem 
> resides. The cdev_del() call has no locking to prevent a call from 
> cdev_get(), and it can be called asynchronously as well in response to a 
> USB disconnect.
> 
> > > Secondly, why shouldn't struct cdev be embedded in anything? It's used
> > > in lots of drivers that way. I really don't see what's unusual or
> > > messy about v4l in that respect.
> >
> > I don't see it embedded in any other structures at the moment, and used
> > to reference count the structure, do you have pointers to where it is?
> 
> Hmm, I took a closer look and it looks like v4l is indeed the first to use 
> cdev for refcounting. Others either don't need it or do their own 
> refcounting. It's definitely embedded in several structs, though. (grep for 
> cdev_init)

Yes, embedding it is fine, just don't use it for reference counting as
it is using it's own reference counting for something other than what
you want to use it for.

Overriding the release function should have given you the hint that this
is not the structure that you should be using for this kind of
operation.

> However, I don't see how any amount of refcounting in drivers can prevent 
> this race. At some point cdev_del() is called and (if nobody is using the 
> chardev) cdev's kref goes to 0 triggering the release, and at that moment 
> chrdev_open() can be called and the driver has no chance to prevent that, 
> only cdev can do that.

Again, don't use cdev's reference counting for your own object
lifecycle, it is different and will cause problems, like you have found
out.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-16 23:30         ` Greg KH
@ 2008-12-17 13:37             ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 13:37 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, v4l, Laurent Pinchart

On Wednesday 17 December 2008 00:30:39 Greg KH wrote:
> On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
> > On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
> > > On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> > > > On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> > > > > On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> > > > > > Hi Greg,
> > > > > >
> > > > > > Laurent found a race condition in the uvc driver that we traced
> > > > > > to the way chrdev_open and cdev_put/get work.
> > > > > >
> > > > > > You need the following ingredients to reproduce it:
> > > > > >
> > > > > > 1) a hot-pluggable char device like an USB webcam.
> > > > > > 2) a manually created device node for such a webcam instead of
> > > > > > relying on udev.
> > > > > >
> > > > > > In order to easily force this situation you would also need to
> > > > > > add a delay to the char device's release() function. For
> > > > > > webcams that would be at the top of v4l2_chardev_release() in
> > > > > > drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
> > > > > > cdev_purge would have the same effect.
> > > > > >
> > > > > > The sequence of events in the case of a webcam is as follows:
> > > > > >
> > > > > > 1) The USB device is removed, causing a disconnect.
> > > > > >
> > > > > > 2) The webcam driver unregisters the video device which in turn
> > > > > > calls cdev_del().
> > > > > >
> > > > > > 3) When the last application using the device is closed, the
> > > > > > cdev is released when the kref of the cdev's kobject goes to 0.
> > > > > >
> > > > > > 4) If the kref's release() call takes a while due to e.g. extra
> > > > > > cleanup in the case of a webcam, then another application can
> > > > > > try to open the video device. Note that this requires a device
> > > > > > node created with mknod, otherwise the device nodes would
> > > > > > already have been removed by udev.
> > > > > >
> > > > > > 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> > > > > > device node was never accessed before), then all is fine since
> > > > > > kobj_lookup will fail because cdev_del() has been called
> > > > > > earlier. However, if this device node was used earlier, then
> > > > > > the else part is called: cdev_get(p). This 'p' is the cdev that
> > > > > > is being released. Since the kref count is 0 you will get a
> > > > > > WARN message from kref_get, but the code continues on, the
> > > > > > f_op->open will (hopefully) return more-or-less gracefully with
> > > > > > an error and the cdev_put at the end will cause the refcount to
> > > > > > go to 0 again, which results in a SECOND call to the kref's
> > > > > > release function!
> > > > > >
> > > > > > See this link for the original discussion on the v4l list
> > > > > > containing stack traces an a patch that you need if you want to
> > > > > > (and can) test this with the uvc driver:
> > > > > >
> > > > > > http://www.spinics.net/lists/vfl/msg39967.html
> > > > >
> > > > > The second sentence in that message shows your problem here:
> > > > > 	To avoid the need of a reference count in every v4l2 driver,
> > > > > 	v4l2 moved to cdev which includes its own reference counting
> > > > > 	infrastructure based on kobject.
> > > > >
> > > > > cdev is not ment to handle the reference counting of any object
> > > > > outside of itself, and should never be embedded within anything.
> > > > > I've been thinking of taking the real "kobject" out of that
> > > > > structure for a long time now, incase someone did something
> > > > > foolish like this.
> > > > >
> > > > > Seems I was too late :(
> > > > >
> > > > > So, to solve this, just remove the reliance on struct cdev in
> > > > > your own structures, you don't want to do this for the very
> > > > > reason you have now found (and for others, like the fact that
> > > > > this isn't a "real" struct kobject in play here, just a fake
> > > > > one.)
> > > > >
> > > > > Ick, what a mess.
> > > >
> > > > Sorry, but this makes no sense. First of all the race condition
> > > > exists regardless of how v4l uses it. Other drivers using cdev with
> > > > a hot-pluggable device in combination with a manually created
> > > > device node should show the same problem.
> > >
> > > I don't see how this would be, unless this problem has always existed
> > > in the cdev code, it was created back before dynamic device nodes
> > > with udev existed :)
> > >
> > > > It's just that we found it with v4l because the release callback
> > > > takes longer than usual, thus increasing the chances of hitting the
> > > > race.
> > >
> > > The release callback for the cdev itself?
> >
> > Yes, although we override the release kobj_type.
>
> Hahaha, yeah, I'm really afraid to say it, but you deserve the oops if
> you do that :)
>
> > I noticed that a patch to do this through a cdev function was recently
> > posted (and possibly already merged).
>
> No, it was rejected as it was not needed.
>
> > > > The core problem is simply that it is possible to call cdev_get
> > > > while in cdev_put! That should never happen.
> > >
> > > True, and cdev_put is only called from __fput() which has the proper
> > > locking to handle the call to cdev_get() if a char device is opened,
> > > right?
> >
> > cdev_put is also called through cdev_del, and that's where the problem
> > resides. The cdev_del() call has no locking to prevent a call from
> > cdev_get(), and it can be called asynchronously as well in response to
> > a USB disconnect.
> >
> > > > Secondly, why shouldn't struct cdev be embedded in anything? It's
> > > > used in lots of drivers that way. I really don't see what's unusual
> > > > or messy about v4l in that respect.
> > >
> > > I don't see it embedded in any other structures at the moment, and
> > > used to reference count the structure, do you have pointers to where
> > > it is?
> >
> > Hmm, I took a closer look and it looks like v4l is indeed the first to
> > use cdev for refcounting. Others either don't need it or do their own
> > refcounting. It's definitely embedded in several structs, though. (grep
> > for cdev_init)
>
> Yes, embedding it is fine, just don't use it for reference counting as
> it is using it's own reference counting for something other than what
> you want to use it for.
>
> Overriding the release function should have given you the hint that this
> is not the structure that you should be using for this kind of
> operation.
>
> > However, I don't see how any amount of refcounting in drivers can
> > prevent this race. At some point cdev_del() is called and (if nobody is
> > using the chardev) cdev's kref goes to 0 triggering the release, and at
> > that moment chrdev_open() can be called and the driver has no chance to
> > prevent that, only cdev can do that.
>
> Again, don't use cdev's reference counting for your own object
> lifecycle, it is different and will cause problems, like you have found
> out.

Sigh. It has nothing to do with how v4l uses it. And to demonstrate this, 
here is how you reproduce it with the sg module (tested it with my USB 
harddisk).

1) apply this patch to char_dev.c:

--- char_dev.c.orig	2008-12-17 13:30:22.000000000 +0100
+++ char_dev.c	2008-12-17 08:50:23.000000000 +0100
@@ -412,6 +412,9 @@
 
 static void cdev_purge(struct cdev *cdev)
 {
+	printk("delayin\n");
+	mdelay(2000);
+	printk("delayout\n");
 	spin_lock(&cdev_lock);
 	while (!list_empty(&cdev->list)) {
 		struct inode *inode;


2) connect your harddisk and make note of the sgX device that's created.
3) use mknod to create an sgX device manually for that harddisk.
4) open and close it (cat ./sgX >/dev/null followed by ctrl-C is sufficient)
5) disconnect the harddisk
6) when 'delayin' appears in the log, run cat ./sgX >/dev/null again
7) sit back and enjoy this lovely race condition:

usb-storage: device found at 6
usb-storage: waiting for device to settle before scanning
scsi 6:0:0:0: Direct-Access     WD       5000BEV External 1.05 PQ: 0 ANSI: 4
sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
sd 6:0:0:0: [sdg] Write Protect is off
sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
sd 6:0:0:0: [sdg] Assuming drive cache: write through
sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
sd 6:0:0:0: [sdg] Write Protect is off
sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
sd 6:0:0:0: [sdg] Assuming drive cache: write through
sdg: sdg1
sd 6:0:0:0: [sdg] Attached SCSI disk
sd 6:0:0:0: Attached scsi generic sg8 type 0
usb-storage: device scan complete
usb 2-4.1: USB disconnect, address 6
delayin
------------[ cut here ]------------
WARNING: at lib/kref.c:43 kref_get+0x17/0x1c()
Modules linked in: sg
Pid: 1491, comm: cat Not tainted 2.6.26-durdane-test #3
[<c0134ad4>] warn_on_slowpath+0x40/0x79
[<c0160ff8>] __do_fault+0x2e4/0x334
[<c017e0c7>] __d_lookup+0xce/0x115
[<c0175393>] do_lookup+0x53/0x145
[<c017d523>] dput+0x16/0xdc
[<c0176d53>] __link_path_walk+0xb36/0xc30
[<c0181d44>] mntput_no_expire+0x18/0xef
[<c02225fb>] kref_get+0x17/0x1c
[<c0221be2>] kobject_get+0xf/0x13
[<c01710b1>] cdev_get+0x55/0x69
[<c01712ce>] chrdev_open+0xc2/0x184
[<c017120c>] chrdev_open+0x0/0x184
[<c016dd3b>] __dentry_open+0x127/0x210
[<c016dea1>] nameidata_to_filp+0x1c/0x2c
[<c01780c4>] do_filp_open+0x33d/0x6d5
[<c016daee>] get_unused_fd_flags+0xb8/0xc2
[<c016db35>] do_sys_open+0x3d/0xb6
[<c016dbf2>] sys_open+0x1e/0x23
[<c01188a9>] sysenter_past_esp+0x6a/0x91
=======================
---[ end trace c8225d49e3c03b85 ]---
delayin
delayout
delayout

Note the duplicate 'delayin' messages. Also note that the cdev struct was 
allocated by sg.c, so the second cdev cleanup will likely poke into already 
freed memory.

I tried to reproduce it as well without the patched char_dev.c by using this 
little program:

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

main()
{
        int state = 0;

        while (1) {
                int fh = open("./sg8", O_RDONLY);
                if (fh >= 0) {
                        if (state == 0) {
                                printf("opened\n");
                                state = 1;
                        }
                        close(fh);
                }
                else {
                        if (state) {
                                printf("error\n");
                                state = 0;
                        }
                }
        }
}

Just run it and connect/disconnect the USB harddisk.

Unfortunately, I wasn't able to reproduce it because I always hit a bug in 
khelper first after only a few attempts:

sd 13:0:0:0: [sdg] Assuming drive cache: write through
sdg: sdg1
sd 13:0:0:0: [sdg] Attached SCSI disk
sd 13:0:0:0: Attached scsi generic sg8 type 0
usb-storage: device scan complete
usb 2-4.1: USB disconnect, address 13
VFS: Close: file count is 0
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25
*pdpt = 000000002ebb5001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: nvidia(P)

Pid: 13662, comm: khelper Tainted: P          (2.6.26-durdane #1)
EIP: 0060:[<c0409df2>] EFLAGS: 00010002 CPU: 1
EIP is at _spin_lock_irqsave+0x13/0x25
EAX: 00000004 EBX: 00000000 ECX: 00000213 EDX: 00000100
ESI: 00000004 EDI: 00000000 EBP: 00000000 ESP: eeabffc8
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process khelper (pid: 13662, ti=eeabe000 task=eeb280c0 task.ti=eeabe000)
Stack: c012e788 ee8fb300 00000000 00000000 c014144c 00000000 c01413ec 
c01194ab
ee8fb300 00000000 00000000 00000000 00000000 00000000
Call Trace:
[<c012e788>] complete+0xf/0x36
[<c014144c>] wait_for_helper+0x60/0x65
[<c01413ec>] wait_for_helper+0x0/0x65
[<c01194ab>] kernel_thread_helper+0x7/0x10
=======================
Code: 90 89 e2 81 e2 00 e0 ff ff ff 42 14 f0 83 28 01 79 05 e8 ae ff ff ff 
c3 9c 59 fa 89 e2 81 e2 00 e0 ff ff ff 42 14 ba 00 01 00 00 <f0> 66 0f c1 
10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 c3 fa 89 e2
EIP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25 SS:ESP 0068:eeabffc8
---[ end trace f5d38a13beb6dc77 ]---
note: khelper[13662] exited with preempt_count 1

I tried the same on a 2.6.28-rc8 kernel and got the same khelper error. 
Lovely.

So, my conclusion remains that cdev has a race condition due to insufficient 
locking between cdev_del/put and cdev_get. And that khelper has it's own 
problems as well :-(

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
@ 2008-12-17 13:37             ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 13:37 UTC (permalink / raw)
  To: Greg KH; +Cc: v4l, linux-kernel

On Wednesday 17 December 2008 00:30:39 Greg KH wrote:
> On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
> > On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
> > > On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> > > > On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> > > > > On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> > > > > > Hi Greg,
> > > > > >
> > > > > > Laurent found a race condition in the uvc driver that we traced
> > > > > > to the way chrdev_open and cdev_put/get work.
> > > > > >
> > > > > > You need the following ingredients to reproduce it:
> > > > > >
> > > > > > 1) a hot-pluggable char device like an USB webcam.
> > > > > > 2) a manually created device node for such a webcam instead of
> > > > > > relying on udev.
> > > > > >
> > > > > > In order to easily force this situation you would also need to
> > > > > > add a delay to the char device's release() function. For
> > > > > > webcams that would be at the top of v4l2_chardev_release() in
> > > > > > drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
> > > > > > cdev_purge would have the same effect.
> > > > > >
> > > > > > The sequence of events in the case of a webcam is as follows:
> > > > > >
> > > > > > 1) The USB device is removed, causing a disconnect.
> > > > > >
> > > > > > 2) The webcam driver unregisters the video device which in turn
> > > > > > calls cdev_del().
> > > > > >
> > > > > > 3) When the last application using the device is closed, the
> > > > > > cdev is released when the kref of the cdev's kobject goes to 0.
> > > > > >
> > > > > > 4) If the kref's release() call takes a while due to e.g. extra
> > > > > > cleanup in the case of a webcam, then another application can
> > > > > > try to open the video device. Note that this requires a device
> > > > > > node created with mknod, otherwise the device nodes would
> > > > > > already have been removed by udev.
> > > > > >
> > > > > > 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> > > > > > device node was never accessed before), then all is fine since
> > > > > > kobj_lookup will fail because cdev_del() has been called
> > > > > > earlier. However, if this device node was used earlier, then
> > > > > > the else part is called: cdev_get(p). This 'p' is the cdev that
> > > > > > is being released. Since the kref count is 0 you will get a
> > > > > > WARN message from kref_get, but the code continues on, the
> > > > > > f_op->open will (hopefully) return more-or-less gracefully with
> > > > > > an error and the cdev_put at the end will cause the refcount to
> > > > > > go to 0 again, which results in a SECOND call to the kref's
> > > > > > release function!
> > > > > >
> > > > > > See this link for the original discussion on the v4l list
> > > > > > containing stack traces an a patch that you need if you want to
> > > > > > (and can) test this with the uvc driver:
> > > > > >
> > > > > > http://www.spinics.net/lists/vfl/msg39967.html
> > > > >
> > > > > The second sentence in that message shows your problem here:
> > > > > 	To avoid the need of a reference count in every v4l2 driver,
> > > > > 	v4l2 moved to cdev which includes its own reference counting
> > > > > 	infrastructure based on kobject.
> > > > >
> > > > > cdev is not ment to handle the reference counting of any object
> > > > > outside of itself, and should never be embedded within anything.
> > > > > I've been thinking of taking the real "kobject" out of that
> > > > > structure for a long time now, incase someone did something
> > > > > foolish like this.
> > > > >
> > > > > Seems I was too late :(
> > > > >
> > > > > So, to solve this, just remove the reliance on struct cdev in
> > > > > your own structures, you don't want to do this for the very
> > > > > reason you have now found (and for others, like the fact that
> > > > > this isn't a "real" struct kobject in play here, just a fake
> > > > > one.)
> > > > >
> > > > > Ick, what a mess.
> > > >
> > > > Sorry, but this makes no sense. First of all the race condition
> > > > exists regardless of how v4l uses it. Other drivers using cdev with
> > > > a hot-pluggable device in combination with a manually created
> > > > device node should show the same problem.
> > >
> > > I don't see how this would be, unless this problem has always existed
> > > in the cdev code, it was created back before dynamic device nodes
> > > with udev existed :)
> > >
> > > > It's just that we found it with v4l because the release callback
> > > > takes longer than usual, thus increasing the chances of hitting the
> > > > race.
> > >
> > > The release callback for the cdev itself?
> >
> > Yes, although we override the release kobj_type.
>
> Hahaha, yeah, I'm really afraid to say it, but you deserve the oops if
> you do that :)
>
> > I noticed that a patch to do this through a cdev function was recently
> > posted (and possibly already merged).
>
> No, it was rejected as it was not needed.
>
> > > > The core problem is simply that it is possible to call cdev_get
> > > > while in cdev_put! That should never happen.
> > >
> > > True, and cdev_put is only called from __fput() which has the proper
> > > locking to handle the call to cdev_get() if a char device is opened,
> > > right?
> >
> > cdev_put is also called through cdev_del, and that's where the problem
> > resides. The cdev_del() call has no locking to prevent a call from
> > cdev_get(), and it can be called asynchronously as well in response to
> > a USB disconnect.
> >
> > > > Secondly, why shouldn't struct cdev be embedded in anything? It's
> > > > used in lots of drivers that way. I really don't see what's unusual
> > > > or messy about v4l in that respect.
> > >
> > > I don't see it embedded in any other structures at the moment, and
> > > used to reference count the structure, do you have pointers to where
> > > it is?
> >
> > Hmm, I took a closer look and it looks like v4l is indeed the first to
> > use cdev for refcounting. Others either don't need it or do their own
> > refcounting. It's definitely embedded in several structs, though. (grep
> > for cdev_init)
>
> Yes, embedding it is fine, just don't use it for reference counting as
> it is using it's own reference counting for something other than what
> you want to use it for.
>
> Overriding the release function should have given you the hint that this
> is not the structure that you should be using for this kind of
> operation.
>
> > However, I don't see how any amount of refcounting in drivers can
> > prevent this race. At some point cdev_del() is called and (if nobody is
> > using the chardev) cdev's kref goes to 0 triggering the release, and at
> > that moment chrdev_open() can be called and the driver has no chance to
> > prevent that, only cdev can do that.
>
> Again, don't use cdev's reference counting for your own object
> lifecycle, it is different and will cause problems, like you have found
> out.

Sigh. It has nothing to do with how v4l uses it. And to demonstrate this, 
here is how you reproduce it with the sg module (tested it with my USB 
harddisk).

1) apply this patch to char_dev.c:

--- char_dev.c.orig	2008-12-17 13:30:22.000000000 +0100
+++ char_dev.c	2008-12-17 08:50:23.000000000 +0100
@@ -412,6 +412,9 @@
 
 static void cdev_purge(struct cdev *cdev)
 {
+	printk("delayin\n");
+	mdelay(2000);
+	printk("delayout\n");
 	spin_lock(&cdev_lock);
 	while (!list_empty(&cdev->list)) {
 		struct inode *inode;


2) connect your harddisk and make note of the sgX device that's created.
3) use mknod to create an sgX device manually for that harddisk.
4) open and close it (cat ./sgX >/dev/null followed by ctrl-C is sufficient)
5) disconnect the harddisk
6) when 'delayin' appears in the log, run cat ./sgX >/dev/null again
7) sit back and enjoy this lovely race condition:

usb-storage: device found at 6
usb-storage: waiting for device to settle before scanning
scsi 6:0:0:0: Direct-Access     WD       5000BEV External 1.05 PQ: 0 ANSI: 4
sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
sd 6:0:0:0: [sdg] Write Protect is off
sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
sd 6:0:0:0: [sdg] Assuming drive cache: write through
sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
sd 6:0:0:0: [sdg] Write Protect is off
sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
sd 6:0:0:0: [sdg] Assuming drive cache: write through
sdg: sdg1
sd 6:0:0:0: [sdg] Attached SCSI disk
sd 6:0:0:0: Attached scsi generic sg8 type 0
usb-storage: device scan complete
usb 2-4.1: USB disconnect, address 6
delayin
------------[ cut here ]------------
WARNING: at lib/kref.c:43 kref_get+0x17/0x1c()
Modules linked in: sg
Pid: 1491, comm: cat Not tainted 2.6.26-durdane-test #3
[<c0134ad4>] warn_on_slowpath+0x40/0x79
[<c0160ff8>] __do_fault+0x2e4/0x334
[<c017e0c7>] __d_lookup+0xce/0x115
[<c0175393>] do_lookup+0x53/0x145
[<c017d523>] dput+0x16/0xdc
[<c0176d53>] __link_path_walk+0xb36/0xc30
[<c0181d44>] mntput_no_expire+0x18/0xef
[<c02225fb>] kref_get+0x17/0x1c
[<c0221be2>] kobject_get+0xf/0x13
[<c01710b1>] cdev_get+0x55/0x69
[<c01712ce>] chrdev_open+0xc2/0x184
[<c017120c>] chrdev_open+0x0/0x184
[<c016dd3b>] __dentry_open+0x127/0x210
[<c016dea1>] nameidata_to_filp+0x1c/0x2c
[<c01780c4>] do_filp_open+0x33d/0x6d5
[<c016daee>] get_unused_fd_flags+0xb8/0xc2
[<c016db35>] do_sys_open+0x3d/0xb6
[<c016dbf2>] sys_open+0x1e/0x23
[<c01188a9>] sysenter_past_esp+0x6a/0x91
=======================
---[ end trace c8225d49e3c03b85 ]---
delayin
delayout
delayout

Note the duplicate 'delayin' messages. Also note that the cdev struct was 
allocated by sg.c, so the second cdev cleanup will likely poke into already 
freed memory.

I tried to reproduce it as well without the patched char_dev.c by using this 
little program:

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>

main()
{
        int state = 0;

        while (1) {
                int fh = open("./sg8", O_RDONLY);
                if (fh >= 0) {
                        if (state == 0) {
                                printf("opened\n");
                                state = 1;
                        }
                        close(fh);
                }
                else {
                        if (state) {
                                printf("error\n");
                                state = 0;
                        }
                }
        }
}

Just run it and connect/disconnect the USB harddisk.

Unfortunately, I wasn't able to reproduce it because I always hit a bug in 
khelper first after only a few attempts:

sd 13:0:0:0: [sdg] Assuming drive cache: write through
sdg: sdg1
sd 13:0:0:0: [sdg] Attached SCSI disk
sd 13:0:0:0: Attached scsi generic sg8 type 0
usb-storage: device scan complete
usb 2-4.1: USB disconnect, address 13
VFS: Close: file count is 0
BUG: unable to handle kernel NULL pointer dereference at 00000004
IP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25
*pdpt = 000000002ebb5001 *pde = 0000000000000000
Oops: 0002 [#1] PREEMPT SMP
Modules linked in: nvidia(P)

Pid: 13662, comm: khelper Tainted: P          (2.6.26-durdane #1)
EIP: 0060:[<c0409df2>] EFLAGS: 00010002 CPU: 1
EIP is at _spin_lock_irqsave+0x13/0x25
EAX: 00000004 EBX: 00000000 ECX: 00000213 EDX: 00000100
ESI: 00000004 EDI: 00000000 EBP: 00000000 ESP: eeabffc8
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process khelper (pid: 13662, ti=eeabe000 task=eeb280c0 task.ti=eeabe000)
Stack: c012e788 ee8fb300 00000000 00000000 c014144c 00000000 c01413ec 
c01194ab
ee8fb300 00000000 00000000 00000000 00000000 00000000
Call Trace:
[<c012e788>] complete+0xf/0x36
[<c014144c>] wait_for_helper+0x60/0x65
[<c01413ec>] wait_for_helper+0x0/0x65
[<c01194ab>] kernel_thread_helper+0x7/0x10
=======================
Code: 90 89 e2 81 e2 00 e0 ff ff ff 42 14 f0 83 28 01 79 05 e8 ae ff ff ff 
c3 9c 59 fa 89 e2 81 e2 00 e0 ff ff ff 42 14 ba 00 01 00 00 <f0> 66 0f c1 
10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 c3 fa 89 e2
EIP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25 SS:ESP 0068:eeabffc8
---[ end trace f5d38a13beb6dc77 ]---
note: khelper[13662] exited with preempt_count 1

I tried the same on a 2.6.28-rc8 kernel and got the same khelper error. 
Lovely.

So, my conclusion remains that cdev has a race condition due to insufficient 
locking between cdev_del/put and cdev_get. And that khelper has it's own 
problems as well :-(

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 13:37             ` Hans Verkuil
  (?)
@ 2008-12-17 14:52             ` Boaz Harrosh
  2008-12-17 15:07                 ` Hans Verkuil
  -1 siblings, 1 reply; 35+ messages in thread
From: Boaz Harrosh @ 2008-12-17 14:52 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: Greg KH, linux-kernel, v4l, Laurent Pinchart, Tejun Heo

Hans Verkuil wrote:
> On Wednesday 17 December 2008 00:30:39 Greg KH wrote:
>> On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
>>> On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
>>>> On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
>>>>> On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
>>>>>> On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
>>>>>>> Hi Greg,
>>>>>>>
>>>>>>> Laurent found a race condition in the uvc driver that we traced
>>>>>>> to the way chrdev_open and cdev_put/get work.
>>>>>>>
>>>>>>> You need the following ingredients to reproduce it:
>>>>>>>
>>>>>>> 1) a hot-pluggable char device like an USB webcam.
>>>>>>> 2) a manually created device node for such a webcam instead of
>>>>>>> relying on udev.
>>>>>>>
>>>>>>> In order to easily force this situation you would also need to
>>>>>>> add a delay to the char device's release() function. For
>>>>>>> webcams that would be at the top of v4l2_chardev_release() in
>>>>>>> drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
>>>>>>> cdev_purge would have the same effect.
>>>>>>>
>>>>>>> The sequence of events in the case of a webcam is as follows:
>>>>>>>
>>>>>>> 1) The USB device is removed, causing a disconnect.
>>>>>>>
>>>>>>> 2) The webcam driver unregisters the video device which in turn
>>>>>>> calls cdev_del().
>>>>>>>
>>>>>>> 3) When the last application using the device is closed, the
>>>>>>> cdev is released when the kref of the cdev's kobject goes to 0.
>>>>>>>
>>>>>>> 4) If the kref's release() call takes a while due to e.g. extra
>>>>>>> cleanup in the case of a webcam, then another application can
>>>>>>> try to open the video device. Note that this requires a device
>>>>>>> node created with mknod, otherwise the device nodes would
>>>>>>> already have been removed by udev.
>>>>>>>
>>>>>>> 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
>>>>>>> device node was never accessed before), then all is fine since
>>>>>>> kobj_lookup will fail because cdev_del() has been called
>>>>>>> earlier. However, if this device node was used earlier, then
>>>>>>> the else part is called: cdev_get(p). This 'p' is the cdev that
>>>>>>> is being released. Since the kref count is 0 you will get a
>>>>>>> WARN message from kref_get, but the code continues on, the
>>>>>>> f_op->open will (hopefully) return more-or-less gracefully with
>>>>>>> an error and the cdev_put at the end will cause the refcount to
>>>>>>> go to 0 again, which results in a SECOND call to the kref's
>>>>>>> release function!
>>>>>>>
>>>>>>> See this link for the original discussion on the v4l list
>>>>>>> containing stack traces an a patch that you need if you want to
>>>>>>> (and can) test this with the uvc driver:
>>>>>>>
>>>>>>> http://www.spinics.net/lists/vfl/msg39967.html
>>>>>> The second sentence in that message shows your problem here:
>>>>>> 	To avoid the need of a reference count in every v4l2 driver,
>>>>>> 	v4l2 moved to cdev which includes its own reference counting
>>>>>> 	infrastructure based on kobject.
>>>>>>
>>>>>> cdev is not ment to handle the reference counting of any object
>>>>>> outside of itself, and should never be embedded within anything.
>>>>>> I've been thinking of taking the real "kobject" out of that
>>>>>> structure for a long time now, incase someone did something
>>>>>> foolish like this.
>>>>>>
>>>>>> Seems I was too late :(
>>>>>>
>>>>>> So, to solve this, just remove the reliance on struct cdev in
>>>>>> your own structures, you don't want to do this for the very
>>>>>> reason you have now found (and for others, like the fact that
>>>>>> this isn't a "real" struct kobject in play here, just a fake
>>>>>> one.)
>>>>>>
>>>>>> Ick, what a mess.
>>>>> Sorry, but this makes no sense. First of all the race condition
>>>>> exists regardless of how v4l uses it. Other drivers using cdev with
>>>>> a hot-pluggable device in combination with a manually created
>>>>> device node should show the same problem.
>>>> I don't see how this would be, unless this problem has always existed
>>>> in the cdev code, it was created back before dynamic device nodes
>>>> with udev existed :)
>>>>
>>>>> It's just that we found it with v4l because the release callback
>>>>> takes longer than usual, thus increasing the chances of hitting the
>>>>> race.
>>>> The release callback for the cdev itself?
>>> Yes, although we override the release kobj_type.
>> Hahaha, yeah, I'm really afraid to say it, but you deserve the oops if
>> you do that :)
>>
>>> I noticed that a patch to do this through a cdev function was recently
>>> posted (and possibly already merged).
>> No, it was rejected as it was not needed.
>>
>>>>> The core problem is simply that it is possible to call cdev_get
>>>>> while in cdev_put! That should never happen.
>>>> True, and cdev_put is only called from __fput() which has the proper
>>>> locking to handle the call to cdev_get() if a char device is opened,
>>>> right?
>>> cdev_put is also called through cdev_del, and that's where the problem
>>> resides. The cdev_del() call has no locking to prevent a call from
>>> cdev_get(), and it can be called asynchronously as well in response to
>>> a USB disconnect.
>>>
>>>>> Secondly, why shouldn't struct cdev be embedded in anything? It's
>>>>> used in lots of drivers that way. I really don't see what's unusual
>>>>> or messy about v4l in that respect.
>>>> I don't see it embedded in any other structures at the moment, and
>>>> used to reference count the structure, do you have pointers to where
>>>> it is?
>>> Hmm, I took a closer look and it looks like v4l is indeed the first to
>>> use cdev for refcounting. Others either don't need it or do their own
>>> refcounting. It's definitely embedded in several structs, though. (grep
>>> for cdev_init)
>> Yes, embedding it is fine, just don't use it for reference counting as
>> it is using it's own reference counting for something other than what
>> you want to use it for.
>>
>> Overriding the release function should have given you the hint that this
>> is not the structure that you should be using for this kind of
>> operation.
>>
>>> However, I don't see how any amount of refcounting in drivers can
>>> prevent this race. At some point cdev_del() is called and (if nobody is
>>> using the chardev) cdev's kref goes to 0 triggering the release, and at
>>> that moment chrdev_open() can be called and the driver has no chance to
>>> prevent that, only cdev can do that.
>> Again, don't use cdev's reference counting for your own object
>> lifecycle, it is different and will cause problems, like you have found
>> out.
> 
> Sigh. It has nothing to do with how v4l uses it. And to demonstrate this, 
> here is how you reproduce it with the sg module (tested it with my USB 
> harddisk).
> 
> 1) apply this patch to char_dev.c:
> 
> --- char_dev.c.orig	2008-12-17 13:30:22.000000000 +0100
> +++ char_dev.c	2008-12-17 08:50:23.000000000 +0100
> @@ -412,6 +412,9 @@
>  
>  static void cdev_purge(struct cdev *cdev)
>  {
> +	printk("delayin\n");
> +	mdelay(2000);
> +	printk("delayout\n");
>  	spin_lock(&cdev_lock);
>  	while (!list_empty(&cdev->list)) {
>  		struct inode *inode;
> 
> 
> 2) connect your harddisk and make note of the sgX device that's created.
> 3) use mknod to create an sgX device manually for that harddisk.
> 4) open and close it (cat ./sgX >/dev/null followed by ctrl-C is sufficient)
> 5) disconnect the harddisk
> 6) when 'delayin' appears in the log, run cat ./sgX >/dev/null again
> 7) sit back and enjoy this lovely race condition:
> 
> usb-storage: device found at 6
> usb-storage: waiting for device to settle before scanning
> scsi 6:0:0:0: Direct-Access     WD       5000BEV External 1.05 PQ: 0 ANSI: 4
> sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
> sd 6:0:0:0: [sdg] Write Protect is off
> sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
> sd 6:0:0:0: [sdg] Assuming drive cache: write through
> sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
> sd 6:0:0:0: [sdg] Write Protect is off
> sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
> sd 6:0:0:0: [sdg] Assuming drive cache: write through
> sdg: sdg1
> sd 6:0:0:0: [sdg] Attached SCSI disk
> sd 6:0:0:0: Attached scsi generic sg8 type 0
> usb-storage: device scan complete
> usb 2-4.1: USB disconnect, address 6
> delayin
> ------------[ cut here ]------------
> WARNING: at lib/kref.c:43 kref_get+0x17/0x1c()
> Modules linked in: sg
> Pid: 1491, comm: cat Not tainted 2.6.26-durdane-test #3
> [<c0134ad4>] warn_on_slowpath+0x40/0x79
> [<c0160ff8>] __do_fault+0x2e4/0x334
> [<c017e0c7>] __d_lookup+0xce/0x115
> [<c0175393>] do_lookup+0x53/0x145
> [<c017d523>] dput+0x16/0xdc
> [<c0176d53>] __link_path_walk+0xb36/0xc30
> [<c0181d44>] mntput_no_expire+0x18/0xef
> [<c02225fb>] kref_get+0x17/0x1c
> [<c0221be2>] kobject_get+0xf/0x13
> [<c01710b1>] cdev_get+0x55/0x69
> [<c01712ce>] chrdev_open+0xc2/0x184
> [<c017120c>] chrdev_open+0x0/0x184
> [<c016dd3b>] __dentry_open+0x127/0x210
> [<c016dea1>] nameidata_to_filp+0x1c/0x2c
> [<c01780c4>] do_filp_open+0x33d/0x6d5
> [<c016daee>] get_unused_fd_flags+0xb8/0xc2
> [<c016db35>] do_sys_open+0x3d/0xb6
> [<c016dbf2>] sys_open+0x1e/0x23
> [<c01188a9>] sysenter_past_esp+0x6a/0x91
> =======================
> ---[ end trace c8225d49e3c03b85 ]---
> delayin
> delayout
> delayout
> 
> Note the duplicate 'delayin' messages. Also note that the cdev struct was 
> allocated by sg.c, so the second cdev cleanup will likely poke into already 
> freed memory.
> 
> I tried to reproduce it as well without the patched char_dev.c by using this 
> little program:
> 
> #include <stdio.h>
> #include <unistd.h>
> #include <fcntl.h>
> 
> main()
> {
>         int state = 0;
> 
>         while (1) {
>                 int fh = open("./sg8", O_RDONLY);
>                 if (fh >= 0) {
>                         if (state == 0) {
>                                 printf("opened\n");
>                                 state = 1;
>                         }
>                         close(fh);
>                 }
>                 else {
>                         if (state) {
>                                 printf("error\n");
>                                 state = 0;
>                         }
>                 }
>         }
> }
> 
> Just run it and connect/disconnect the USB harddisk.
> 
> Unfortunately, I wasn't able to reproduce it because I always hit a bug in 
> khelper first after only a few attempts:
> 
> sd 13:0:0:0: [sdg] Assuming drive cache: write through
> sdg: sdg1
> sd 13:0:0:0: [sdg] Attached SCSI disk
> sd 13:0:0:0: Attached scsi generic sg8 type 0
> usb-storage: device scan complete
> usb 2-4.1: USB disconnect, address 13
> VFS: Close: file count is 0
> BUG: unable to handle kernel NULL pointer dereference at 00000004
> IP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25
> *pdpt = 000000002ebb5001 *pde = 0000000000000000
> Oops: 0002 [#1] PREEMPT SMP
> Modules linked in: nvidia(P)
> 
> Pid: 13662, comm: khelper Tainted: P          (2.6.26-durdane #1)
> EIP: 0060:[<c0409df2>] EFLAGS: 00010002 CPU: 1
> EIP is at _spin_lock_irqsave+0x13/0x25
> EAX: 00000004 EBX: 00000000 ECX: 00000213 EDX: 00000100
> ESI: 00000004 EDI: 00000000 EBP: 00000000 ESP: eeabffc8
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process khelper (pid: 13662, ti=eeabe000 task=eeb280c0 task.ti=eeabe000)
> Stack: c012e788 ee8fb300 00000000 00000000 c014144c 00000000 c01413ec 
> c01194ab
> ee8fb300 00000000 00000000 00000000 00000000 00000000
> Call Trace:
> [<c012e788>] complete+0xf/0x36
> [<c014144c>] wait_for_helper+0x60/0x65
> [<c01413ec>] wait_for_helper+0x0/0x65
> [<c01194ab>] kernel_thread_helper+0x7/0x10
> =======================
> Code: 90 89 e2 81 e2 00 e0 ff ff ff 42 14 f0 83 28 01 79 05 e8 ae ff ff ff 
> c3 9c 59 fa 89 e2 81 e2 00 e0 ff ff ff 42 14 ba 00 01 00 00 <f0> 66 0f c1 
> 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 c3 fa 89 e2
> EIP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25 SS:ESP 0068:eeabffc8
> ---[ end trace f5d38a13beb6dc77 ]---
> note: khelper[13662] exited with preempt_count 1
> 
> I tried the same on a 2.6.28-rc8 kernel and got the same khelper error. 
> Lovely.
> 
> So, my conclusion remains that cdev has a race condition due to insufficient 
> locking between cdev_del/put and cdev_get. And that khelper has it's own 
> problems as well :-(
> 
> Regards,
> 
> 	Hans
> 

Hi Hans

Just as a far shot could you try this patch:

[PATCH RESEND] char_dev: add cdev->release() and convert cdev_alloc() to use it
http://www.spinics.net/lists/kernel/msg808783.html

Your goals look related. Also see the lkml thread.

Please cc me on the matter.

Thanks
Boaz

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 14:52             ` Boaz Harrosh
@ 2008-12-17 15:07                 ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 15:07 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Greg KH, linux-kernel, v4l, Laurent Pinchart, Tejun Heo

On Wednesday 17 December 2008 15:52:38 Boaz Harrosh wrote:
> Hans Verkuil wrote:
> > On Wednesday 17 December 2008 00:30:39 Greg KH wrote:
> >> On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
> >>> On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
> >>>> On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> >>>>> On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> >>>>>> On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> >>>>>>> Hi Greg,
> >>>>>>>
> >>>>>>> Laurent found a race condition in the uvc driver that we traced
> >>>>>>> to the way chrdev_open and cdev_put/get work.
> >>>>>>>
> >>>>>>> You need the following ingredients to reproduce it:
> >>>>>>>
> >>>>>>> 1) a hot-pluggable char device like an USB webcam.
> >>>>>>> 2) a manually created device node for such a webcam instead of
> >>>>>>> relying on udev.
> >>>>>>>
> >>>>>>> In order to easily force this situation you would also need to
> >>>>>>> add a delay to the char device's release() function. For
> >>>>>>> webcams that would be at the top of v4l2_chardev_release() in
> >>>>>>> drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
> >>>>>>> cdev_purge would have the same effect.
> >>>>>>>
> >>>>>>> The sequence of events in the case of a webcam is as follows:
> >>>>>>>
> >>>>>>> 1) The USB device is removed, causing a disconnect.
> >>>>>>>
> >>>>>>> 2) The webcam driver unregisters the video device which in turn
> >>>>>>> calls cdev_del().
> >>>>>>>
> >>>>>>> 3) When the last application using the device is closed, the
> >>>>>>> cdev is released when the kref of the cdev's kobject goes to 0.
> >>>>>>>
> >>>>>>> 4) If the kref's release() call takes a while due to e.g. extra
> >>>>>>> cleanup in the case of a webcam, then another application can
> >>>>>>> try to open the video device. Note that this requires a device
> >>>>>>> node created with mknod, otherwise the device nodes would
> >>>>>>> already have been removed by udev.
> >>>>>>>
> >>>>>>> 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> >>>>>>> device node was never accessed before), then all is fine since
> >>>>>>> kobj_lookup will fail because cdev_del() has been called
> >>>>>>> earlier. However, if this device node was used earlier, then
> >>>>>>> the else part is called: cdev_get(p). This 'p' is the cdev that
> >>>>>>> is being released. Since the kref count is 0 you will get a
> >>>>>>> WARN message from kref_get, but the code continues on, the
> >>>>>>> f_op->open will (hopefully) return more-or-less gracefully with
> >>>>>>> an error and the cdev_put at the end will cause the refcount to
> >>>>>>> go to 0 again, which results in a SECOND call to the kref's
> >>>>>>> release function!
> >>>>>>>
> >>>>>>> See this link for the original discussion on the v4l list
> >>>>>>> containing stack traces an a patch that you need if you want to
> >>>>>>> (and can) test this with the uvc driver:
> >>>>>>>
> >>>>>>> http://www.spinics.net/lists/vfl/msg39967.html
> >>>>>>
> >>>>>> The second sentence in that message shows your problem here:
> >>>>>> 	To avoid the need of a reference count in every v4l2 driver,
> >>>>>> 	v4l2 moved to cdev which includes its own reference counting
> >>>>>> 	infrastructure based on kobject.
> >>>>>>
> >>>>>> cdev is not ment to handle the reference counting of any object
> >>>>>> outside of itself, and should never be embedded within anything.
> >>>>>> I've been thinking of taking the real "kobject" out of that
> >>>>>> structure for a long time now, incase someone did something
> >>>>>> foolish like this.
> >>>>>>
> >>>>>> Seems I was too late :(
> >>>>>>
> >>>>>> So, to solve this, just remove the reliance on struct cdev in
> >>>>>> your own structures, you don't want to do this for the very
> >>>>>> reason you have now found (and for others, like the fact that
> >>>>>> this isn't a "real" struct kobject in play here, just a fake
> >>>>>> one.)
> >>>>>>
> >>>>>> Ick, what a mess.
> >>>>>
> >>>>> Sorry, but this makes no sense. First of all the race condition
> >>>>> exists regardless of how v4l uses it. Other drivers using cdev with
> >>>>> a hot-pluggable device in combination with a manually created
> >>>>> device node should show the same problem.
> >>>>
> >>>> I don't see how this would be, unless this problem has always
> >>>> existed in the cdev code, it was created back before dynamic device
> >>>> nodes with udev existed :)
> >>>>
> >>>>> It's just that we found it with v4l because the release callback
> >>>>> takes longer than usual, thus increasing the chances of hitting the
> >>>>> race.
> >>>>
> >>>> The release callback for the cdev itself?
> >>>
> >>> Yes, although we override the release kobj_type.
> >>
> >> Hahaha, yeah, I'm really afraid to say it, but you deserve the oops if
> >> you do that :)
> >>
> >>> I noticed that a patch to do this through a cdev function was
> >>> recently posted (and possibly already merged).
> >>
> >> No, it was rejected as it was not needed.
> >>
> >>>>> The core problem is simply that it is possible to call cdev_get
> >>>>> while in cdev_put! That should never happen.
> >>>>
> >>>> True, and cdev_put is only called from __fput() which has the proper
> >>>> locking to handle the call to cdev_get() if a char device is opened,
> >>>> right?
> >>>
> >>> cdev_put is also called through cdev_del, and that's where the
> >>> problem resides. The cdev_del() call has no locking to prevent a call
> >>> from cdev_get(), and it can be called asynchronously as well in
> >>> response to a USB disconnect.
> >>>
> >>>>> Secondly, why shouldn't struct cdev be embedded in anything? It's
> >>>>> used in lots of drivers that way. I really don't see what's unusual
> >>>>> or messy about v4l in that respect.
> >>>>
> >>>> I don't see it embedded in any other structures at the moment, and
> >>>> used to reference count the structure, do you have pointers to where
> >>>> it is?
> >>>
> >>> Hmm, I took a closer look and it looks like v4l is indeed the first
> >>> to use cdev for refcounting. Others either don't need it or do their
> >>> own refcounting. It's definitely embedded in several structs, though.
> >>> (grep for cdev_init)
> >>
> >> Yes, embedding it is fine, just don't use it for reference counting as
> >> it is using it's own reference counting for something other than what
> >> you want to use it for.
> >>
> >> Overriding the release function should have given you the hint that
> >> this is not the structure that you should be using for this kind of
> >> operation.
> >>
> >>> However, I don't see how any amount of refcounting in drivers can
> >>> prevent this race. At some point cdev_del() is called and (if nobody
> >>> is using the chardev) cdev's kref goes to 0 triggering the release,
> >>> and at that moment chrdev_open() can be called and the driver has no
> >>> chance to prevent that, only cdev can do that.
> >>
> >> Again, don't use cdev's reference counting for your own object
> >> lifecycle, it is different and will cause problems, like you have
> >> found out.
> >
> > Sigh. It has nothing to do with how v4l uses it. And to demonstrate
> > this, here is how you reproduce it with the sg module (tested it with
> > my USB harddisk).
> >
> > 1) apply this patch to char_dev.c:
> >
> > --- char_dev.c.orig	2008-12-17 13:30:22.000000000 +0100
> > +++ char_dev.c	2008-12-17 08:50:23.000000000 +0100
> > @@ -412,6 +412,9 @@
> >
> >  static void cdev_purge(struct cdev *cdev)
> >  {
> > +	printk("delayin\n");
> > +	mdelay(2000);
> > +	printk("delayout\n");
> >  	spin_lock(&cdev_lock);
> >  	while (!list_empty(&cdev->list)) {
> >  		struct inode *inode;
> >
> >
> > 2) connect your harddisk and make note of the sgX device that's
> > created. 3) use mknod to create an sgX device manually for that
> > harddisk. 4) open and close it (cat ./sgX >/dev/null followed by ctrl-C
> > is sufficient) 5) disconnect the harddisk
> > 6) when 'delayin' appears in the log, run cat ./sgX >/dev/null again
> > 7) sit back and enjoy this lovely race condition:
> >
> > usb-storage: device found at 6
> > usb-storage: waiting for device to settle before scanning
> > scsi 6:0:0:0: Direct-Access     WD       5000BEV External 1.05 PQ: 0
> > ANSI: 4 sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108
> > MB) sd 6:0:0:0: [sdg] Write Protect is off
> > sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
> > sd 6:0:0:0: [sdg] Assuming drive cache: write through
> > sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
> > sd 6:0:0:0: [sdg] Write Protect is off
> > sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
> > sd 6:0:0:0: [sdg] Assuming drive cache: write through
> > sdg: sdg1
> > sd 6:0:0:0: [sdg] Attached SCSI disk
> > sd 6:0:0:0: Attached scsi generic sg8 type 0
> > usb-storage: device scan complete
> > usb 2-4.1: USB disconnect, address 6
> > delayin
> > ------------[ cut here ]------------
> > WARNING: at lib/kref.c:43 kref_get+0x17/0x1c()
> > Modules linked in: sg
> > Pid: 1491, comm: cat Not tainted 2.6.26-durdane-test #3
> > [<c0134ad4>] warn_on_slowpath+0x40/0x79
> > [<c0160ff8>] __do_fault+0x2e4/0x334
> > [<c017e0c7>] __d_lookup+0xce/0x115
> > [<c0175393>] do_lookup+0x53/0x145
> > [<c017d523>] dput+0x16/0xdc
> > [<c0176d53>] __link_path_walk+0xb36/0xc30
> > [<c0181d44>] mntput_no_expire+0x18/0xef
> > [<c02225fb>] kref_get+0x17/0x1c
> > [<c0221be2>] kobject_get+0xf/0x13
> > [<c01710b1>] cdev_get+0x55/0x69
> > [<c01712ce>] chrdev_open+0xc2/0x184
> > [<c017120c>] chrdev_open+0x0/0x184
> > [<c016dd3b>] __dentry_open+0x127/0x210
> > [<c016dea1>] nameidata_to_filp+0x1c/0x2c
> > [<c01780c4>] do_filp_open+0x33d/0x6d5
> > [<c016daee>] get_unused_fd_flags+0xb8/0xc2
> > [<c016db35>] do_sys_open+0x3d/0xb6
> > [<c016dbf2>] sys_open+0x1e/0x23
> > [<c01188a9>] sysenter_past_esp+0x6a/0x91
> > =======================
> > ---[ end trace c8225d49e3c03b85 ]---
> > delayin
> > delayout
> > delayout
> >
> > Note the duplicate 'delayin' messages. Also note that the cdev struct
> > was allocated by sg.c, so the second cdev cleanup will likely poke into
> > already freed memory.
> >
> > I tried to reproduce it as well without the patched char_dev.c by using
> > this little program:
> >
> > #include <stdio.h>
> > #include <unistd.h>
> > #include <fcntl.h>
> >
> > main()
> > {
> >         int state = 0;
> >
> >         while (1) {
> >                 int fh = open("./sg8", O_RDONLY);
> >                 if (fh >= 0) {
> >                         if (state == 0) {
> >                                 printf("opened\n");
> >                                 state = 1;
> >                         }
> >                         close(fh);
> >                 }
> >                 else {
> >                         if (state) {
> >                                 printf("error\n");
> >                                 state = 0;
> >                         }
> >                 }
> >         }
> > }
> >
> > Just run it and connect/disconnect the USB harddisk.
> >
> > Unfortunately, I wasn't able to reproduce it because I always hit a bug
> > in khelper first after only a few attempts:
> >
> > sd 13:0:0:0: [sdg] Assuming drive cache: write through
> > sdg: sdg1
> > sd 13:0:0:0: [sdg] Attached SCSI disk
> > sd 13:0:0:0: Attached scsi generic sg8 type 0
> > usb-storage: device scan complete
> > usb 2-4.1: USB disconnect, address 13
> > VFS: Close: file count is 0
> > BUG: unable to handle kernel NULL pointer dereference at 00000004
> > IP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25
> > *pdpt = 000000002ebb5001 *pde = 0000000000000000
> > Oops: 0002 [#1] PREEMPT SMP
> > Modules linked in: nvidia(P)
> >
> > Pid: 13662, comm: khelper Tainted: P          (2.6.26-durdane #1)
> > EIP: 0060:[<c0409df2>] EFLAGS: 00010002 CPU: 1
> > EIP is at _spin_lock_irqsave+0x13/0x25
> > EAX: 00000004 EBX: 00000000 ECX: 00000213 EDX: 00000100
> > ESI: 00000004 EDI: 00000000 EBP: 00000000 ESP: eeabffc8
> > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > Process khelper (pid: 13662, ti=eeabe000 task=eeb280c0
> > task.ti=eeabe000) Stack: c012e788 ee8fb300 00000000 00000000 c014144c
> > 00000000 c01413ec c01194ab
> > ee8fb300 00000000 00000000 00000000 00000000 00000000
> > Call Trace:
> > [<c012e788>] complete+0xf/0x36
> > [<c014144c>] wait_for_helper+0x60/0x65
> > [<c01413ec>] wait_for_helper+0x0/0x65
> > [<c01194ab>] kernel_thread_helper+0x7/0x10
> > =======================
> > Code: 90 89 e2 81 e2 00 e0 ff ff ff 42 14 f0 83 28 01 79 05 e8 ae ff ff
> > ff c3 9c 59 fa 89 e2 81 e2 00 e0 ff ff ff 42 14 ba 00 01 00 00 <f0> 66
> > 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 c3 fa 89 e2
> > EIP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25 SS:ESP 0068:eeabffc8
> > ---[ end trace f5d38a13beb6dc77 ]---
> > note: khelper[13662] exited with preempt_count 1
> >
> > I tried the same on a 2.6.28-rc8 kernel and got the same khelper error.
> > Lovely.
> >
> > So, my conclusion remains that cdev has a race condition due to
> > insufficient locking between cdev_del/put and cdev_get. And that
> > khelper has it's own problems as well :-(
> >
> > Regards,
> >
> > 	Hans
>
> Hi Hans
>
> Just as a far shot could you try this patch:
>
> [PATCH RESEND] char_dev: add cdev->release() and convert cdev_alloc() to
> use it http://www.spinics.net/lists/kernel/msg808783.html
>
> Your goals look related. Also see the lkml thread.
>
> Please cc me on the matter.

Hi Boaz,

It will not fix this particular race condition, but it would make it easier 
to hook the v4l release callback into cdev. It's a bit of a hack at the 
moment. Greg clearly doesn't like drivers doing things like this, but it's 
still not clear to me why not. As far as I can tell it is the only reliable 
place where you can be 100% certain that it is safe to do the cleanup since 
no one else is or can use the char device ever again.

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
@ 2008-12-17 15:07                 ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 15:07 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Tejun Heo, Greg KH, v4l, linux-kernel

On Wednesday 17 December 2008 15:52:38 Boaz Harrosh wrote:
> Hans Verkuil wrote:
> > On Wednesday 17 December 2008 00:30:39 Greg KH wrote:
> >> On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
> >>> On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
> >>>> On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> >>>>> On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> >>>>>> On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> >>>>>>> Hi Greg,
> >>>>>>>
> >>>>>>> Laurent found a race condition in the uvc driver that we traced
> >>>>>>> to the way chrdev_open and cdev_put/get work.
> >>>>>>>
> >>>>>>> You need the following ingredients to reproduce it:
> >>>>>>>
> >>>>>>> 1) a hot-pluggable char device like an USB webcam.
> >>>>>>> 2) a manually created device node for such a webcam instead of
> >>>>>>> relying on udev.
> >>>>>>>
> >>>>>>> In order to easily force this situation you would also need to
> >>>>>>> add a delay to the char device's release() function. For
> >>>>>>> webcams that would be at the top of v4l2_chardev_release() in
> >>>>>>> drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
> >>>>>>> cdev_purge would have the same effect.
> >>>>>>>
> >>>>>>> The sequence of events in the case of a webcam is as follows:
> >>>>>>>
> >>>>>>> 1) The USB device is removed, causing a disconnect.
> >>>>>>>
> >>>>>>> 2) The webcam driver unregisters the video device which in turn
> >>>>>>> calls cdev_del().
> >>>>>>>
> >>>>>>> 3) When the last application using the device is closed, the
> >>>>>>> cdev is released when the kref of the cdev's kobject goes to 0.
> >>>>>>>
> >>>>>>> 4) If the kref's release() call takes a while due to e.g. extra
> >>>>>>> cleanup in the case of a webcam, then another application can
> >>>>>>> try to open the video device. Note that this requires a device
> >>>>>>> node created with mknod, otherwise the device nodes would
> >>>>>>> already have been removed by udev.
> >>>>>>>
> >>>>>>> 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> >>>>>>> device node was never accessed before), then all is fine since
> >>>>>>> kobj_lookup will fail because cdev_del() has been called
> >>>>>>> earlier. However, if this device node was used earlier, then
> >>>>>>> the else part is called: cdev_get(p). This 'p' is the cdev that
> >>>>>>> is being released. Since the kref count is 0 you will get a
> >>>>>>> WARN message from kref_get, but the code continues on, the
> >>>>>>> f_op->open will (hopefully) return more-or-less gracefully with
> >>>>>>> an error and the cdev_put at the end will cause the refcount to
> >>>>>>> go to 0 again, which results in a SECOND call to the kref's
> >>>>>>> release function!
> >>>>>>>
> >>>>>>> See this link for the original discussion on the v4l list
> >>>>>>> containing stack traces an a patch that you need if you want to
> >>>>>>> (and can) test this with the uvc driver:
> >>>>>>>
> >>>>>>> http://www.spinics.net/lists/vfl/msg39967.html
> >>>>>>
> >>>>>> The second sentence in that message shows your problem here:
> >>>>>> 	To avoid the need of a reference count in every v4l2 driver,
> >>>>>> 	v4l2 moved to cdev which includes its own reference counting
> >>>>>> 	infrastructure based on kobject.
> >>>>>>
> >>>>>> cdev is not ment to handle the reference counting of any object
> >>>>>> outside of itself, and should never be embedded within anything.
> >>>>>> I've been thinking of taking the real "kobject" out of that
> >>>>>> structure for a long time now, incase someone did something
> >>>>>> foolish like this.
> >>>>>>
> >>>>>> Seems I was too late :(
> >>>>>>
> >>>>>> So, to solve this, just remove the reliance on struct cdev in
> >>>>>> your own structures, you don't want to do this for the very
> >>>>>> reason you have now found (and for others, like the fact that
> >>>>>> this isn't a "real" struct kobject in play here, just a fake
> >>>>>> one.)
> >>>>>>
> >>>>>> Ick, what a mess.
> >>>>>
> >>>>> Sorry, but this makes no sense. First of all the race condition
> >>>>> exists regardless of how v4l uses it. Other drivers using cdev with
> >>>>> a hot-pluggable device in combination with a manually created
> >>>>> device node should show the same problem.
> >>>>
> >>>> I don't see how this would be, unless this problem has always
> >>>> existed in the cdev code, it was created back before dynamic device
> >>>> nodes with udev existed :)
> >>>>
> >>>>> It's just that we found it with v4l because the release callback
> >>>>> takes longer than usual, thus increasing the chances of hitting the
> >>>>> race.
> >>>>
> >>>> The release callback for the cdev itself?
> >>>
> >>> Yes, although we override the release kobj_type.
> >>
> >> Hahaha, yeah, I'm really afraid to say it, but you deserve the oops if
> >> you do that :)
> >>
> >>> I noticed that a patch to do this through a cdev function was
> >>> recently posted (and possibly already merged).
> >>
> >> No, it was rejected as it was not needed.
> >>
> >>>>> The core problem is simply that it is possible to call cdev_get
> >>>>> while in cdev_put! That should never happen.
> >>>>
> >>>> True, and cdev_put is only called from __fput() which has the proper
> >>>> locking to handle the call to cdev_get() if a char device is opened,
> >>>> right?
> >>>
> >>> cdev_put is also called through cdev_del, and that's where the
> >>> problem resides. The cdev_del() call has no locking to prevent a call
> >>> from cdev_get(), and it can be called asynchronously as well in
> >>> response to a USB disconnect.
> >>>
> >>>>> Secondly, why shouldn't struct cdev be embedded in anything? It's
> >>>>> used in lots of drivers that way. I really don't see what's unusual
> >>>>> or messy about v4l in that respect.
> >>>>
> >>>> I don't see it embedded in any other structures at the moment, and
> >>>> used to reference count the structure, do you have pointers to where
> >>>> it is?
> >>>
> >>> Hmm, I took a closer look and it looks like v4l is indeed the first
> >>> to use cdev for refcounting. Others either don't need it or do their
> >>> own refcounting. It's definitely embedded in several structs, though.
> >>> (grep for cdev_init)
> >>
> >> Yes, embedding it is fine, just don't use it for reference counting as
> >> it is using it's own reference counting for something other than what
> >> you want to use it for.
> >>
> >> Overriding the release function should have given you the hint that
> >> this is not the structure that you should be using for this kind of
> >> operation.
> >>
> >>> However, I don't see how any amount of refcounting in drivers can
> >>> prevent this race. At some point cdev_del() is called and (if nobody
> >>> is using the chardev) cdev's kref goes to 0 triggering the release,
> >>> and at that moment chrdev_open() can be called and the driver has no
> >>> chance to prevent that, only cdev can do that.
> >>
> >> Again, don't use cdev's reference counting for your own object
> >> lifecycle, it is different and will cause problems, like you have
> >> found out.
> >
> > Sigh. It has nothing to do with how v4l uses it. And to demonstrate
> > this, here is how you reproduce it with the sg module (tested it with
> > my USB harddisk).
> >
> > 1) apply this patch to char_dev.c:
> >
> > --- char_dev.c.orig	2008-12-17 13:30:22.000000000 +0100
> > +++ char_dev.c	2008-12-17 08:50:23.000000000 +0100
> > @@ -412,6 +412,9 @@
> >
> >  static void cdev_purge(struct cdev *cdev)
> >  {
> > +	printk("delayin\n");
> > +	mdelay(2000);
> > +	printk("delayout\n");
> >  	spin_lock(&cdev_lock);
> >  	while (!list_empty(&cdev->list)) {
> >  		struct inode *inode;
> >
> >
> > 2) connect your harddisk and make note of the sgX device that's
> > created. 3) use mknod to create an sgX device manually for that
> > harddisk. 4) open and close it (cat ./sgX >/dev/null followed by ctrl-C
> > is sufficient) 5) disconnect the harddisk
> > 6) when 'delayin' appears in the log, run cat ./sgX >/dev/null again
> > 7) sit back and enjoy this lovely race condition:
> >
> > usb-storage: device found at 6
> > usb-storage: waiting for device to settle before scanning
> > scsi 6:0:0:0: Direct-Access     WD       5000BEV External 1.05 PQ: 0
> > ANSI: 4 sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108
> > MB) sd 6:0:0:0: [sdg] Write Protect is off
> > sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
> > sd 6:0:0:0: [sdg] Assuming drive cache: write through
> > sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
> > sd 6:0:0:0: [sdg] Write Protect is off
> > sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
> > sd 6:0:0:0: [sdg] Assuming drive cache: write through
> > sdg: sdg1
> > sd 6:0:0:0: [sdg] Attached SCSI disk
> > sd 6:0:0:0: Attached scsi generic sg8 type 0
> > usb-storage: device scan complete
> > usb 2-4.1: USB disconnect, address 6
> > delayin
> > ------------[ cut here ]------------
> > WARNING: at lib/kref.c:43 kref_get+0x17/0x1c()
> > Modules linked in: sg
> > Pid: 1491, comm: cat Not tainted 2.6.26-durdane-test #3
> > [<c0134ad4>] warn_on_slowpath+0x40/0x79
> > [<c0160ff8>] __do_fault+0x2e4/0x334
> > [<c017e0c7>] __d_lookup+0xce/0x115
> > [<c0175393>] do_lookup+0x53/0x145
> > [<c017d523>] dput+0x16/0xdc
> > [<c0176d53>] __link_path_walk+0xb36/0xc30
> > [<c0181d44>] mntput_no_expire+0x18/0xef
> > [<c02225fb>] kref_get+0x17/0x1c
> > [<c0221be2>] kobject_get+0xf/0x13
> > [<c01710b1>] cdev_get+0x55/0x69
> > [<c01712ce>] chrdev_open+0xc2/0x184
> > [<c017120c>] chrdev_open+0x0/0x184
> > [<c016dd3b>] __dentry_open+0x127/0x210
> > [<c016dea1>] nameidata_to_filp+0x1c/0x2c
> > [<c01780c4>] do_filp_open+0x33d/0x6d5
> > [<c016daee>] get_unused_fd_flags+0xb8/0xc2
> > [<c016db35>] do_sys_open+0x3d/0xb6
> > [<c016dbf2>] sys_open+0x1e/0x23
> > [<c01188a9>] sysenter_past_esp+0x6a/0x91
> > =======================
> > ---[ end trace c8225d49e3c03b85 ]---
> > delayin
> > delayout
> > delayout
> >
> > Note the duplicate 'delayin' messages. Also note that the cdev struct
> > was allocated by sg.c, so the second cdev cleanup will likely poke into
> > already freed memory.
> >
> > I tried to reproduce it as well without the patched char_dev.c by using
> > this little program:
> >
> > #include <stdio.h>
> > #include <unistd.h>
> > #include <fcntl.h>
> >
> > main()
> > {
> >         int state = 0;
> >
> >         while (1) {
> >                 int fh = open("./sg8", O_RDONLY);
> >                 if (fh >= 0) {
> >                         if (state == 0) {
> >                                 printf("opened\n");
> >                                 state = 1;
> >                         }
> >                         close(fh);
> >                 }
> >                 else {
> >                         if (state) {
> >                                 printf("error\n");
> >                                 state = 0;
> >                         }
> >                 }
> >         }
> > }
> >
> > Just run it and connect/disconnect the USB harddisk.
> >
> > Unfortunately, I wasn't able to reproduce it because I always hit a bug
> > in khelper first after only a few attempts:
> >
> > sd 13:0:0:0: [sdg] Assuming drive cache: write through
> > sdg: sdg1
> > sd 13:0:0:0: [sdg] Attached SCSI disk
> > sd 13:0:0:0: Attached scsi generic sg8 type 0
> > usb-storage: device scan complete
> > usb 2-4.1: USB disconnect, address 13
> > VFS: Close: file count is 0
> > BUG: unable to handle kernel NULL pointer dereference at 00000004
> > IP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25
> > *pdpt = 000000002ebb5001 *pde = 0000000000000000
> > Oops: 0002 [#1] PREEMPT SMP
> > Modules linked in: nvidia(P)
> >
> > Pid: 13662, comm: khelper Tainted: P          (2.6.26-durdane #1)
> > EIP: 0060:[<c0409df2>] EFLAGS: 00010002 CPU: 1
> > EIP is at _spin_lock_irqsave+0x13/0x25
> > EAX: 00000004 EBX: 00000000 ECX: 00000213 EDX: 00000100
> > ESI: 00000004 EDI: 00000000 EBP: 00000000 ESP: eeabffc8
> > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > Process khelper (pid: 13662, ti=eeabe000 task=eeb280c0
> > task.ti=eeabe000) Stack: c012e788 ee8fb300 00000000 00000000 c014144c
> > 00000000 c01413ec c01194ab
> > ee8fb300 00000000 00000000 00000000 00000000 00000000
> > Call Trace:
> > [<c012e788>] complete+0xf/0x36
> > [<c014144c>] wait_for_helper+0x60/0x65
> > [<c01413ec>] wait_for_helper+0x0/0x65
> > [<c01194ab>] kernel_thread_helper+0x7/0x10
> > =======================
> > Code: 90 89 e2 81 e2 00 e0 ff ff ff 42 14 f0 83 28 01 79 05 e8 ae ff ff
> > ff c3 9c 59 fa 89 e2 81 e2 00 e0 ff ff ff 42 14 ba 00 01 00 00 <f0> 66
> > 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 c3 fa 89 e2
> > EIP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25 SS:ESP 0068:eeabffc8
> > ---[ end trace f5d38a13beb6dc77 ]---
> > note: khelper[13662] exited with preempt_count 1
> >
> > I tried the same on a 2.6.28-rc8 kernel and got the same khelper error.
> > Lovely.
> >
> > So, my conclusion remains that cdev has a race condition due to
> > insufficient locking between cdev_del/put and cdev_get. And that
> > khelper has it's own problems as well :-(
> >
> > Regards,
> >
> > 	Hans
>
> Hi Hans
>
> Just as a far shot could you try this patch:
>
> [PATCH RESEND] char_dev: add cdev->release() and convert cdev_alloc() to
> use it http://www.spinics.net/lists/kernel/msg808783.html
>
> Your goals look related. Also see the lkml thread.
>
> Please cc me on the matter.

Hi Boaz,

It will not fix this particular race condition, but it would make it easier 
to hook the v4l release callback into cdev. It's a bit of a hack at the 
moment. Greg clearly doesn't like drivers doing things like this, but it's 
still not clear to me why not. As far as I can tell it is the only reliable 
place where you can be 100% certain that it is safe to do the cleanup since 
no one else is or can use the char device ever again.

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 15:07                 ` Hans Verkuil
  (?)
@ 2008-12-17 16:09                 ` Boaz Harrosh
  2008-12-17 17:33                     ` Hans Verkuil
  -1 siblings, 1 reply; 35+ messages in thread
From: Boaz Harrosh @ 2008-12-17 16:09 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: Greg KH, linux-kernel, v4l, Laurent Pinchart, Tejun Heo

Hans Verkuil wrote:
> On Wednesday 17 December 2008 15:52:38 Boaz Harrosh wrote:
>> Hans Verkuil wrote:
>>> On Wednesday 17 December 2008 00:30:39 Greg KH wrote:
>>>> On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
>>>>> On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
>>>>>> On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
>>>>>>> On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
>>>>>>>> On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
>>>>>>>>> Hi Greg,
>>>>>>>>>
>>>>>>>>> Laurent found a race condition in the uvc driver that we traced
>>>>>>>>> to the way chrdev_open and cdev_put/get work.
>>>>>>>>>
>>>>>>>>> You need the following ingredients to reproduce it:
>>>>>>>>>
>>>>>>>>> 1) a hot-pluggable char device like an USB webcam.
>>>>>>>>> 2) a manually created device node for such a webcam instead of
>>>>>>>>> relying on udev.
>>>>>>>>>
>>>>>>>>> In order to easily force this situation you would also need to
>>>>>>>>> add a delay to the char device's release() function. For
>>>>>>>>> webcams that would be at the top of v4l2_chardev_release() in
>>>>>>>>> drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
>>>>>>>>> cdev_purge would have the same effect.
>>>>>>>>>
>>>>>>>>> The sequence of events in the case of a webcam is as follows:
>>>>>>>>>
>>>>>>>>> 1) The USB device is removed, causing a disconnect.
>>>>>>>>>
>>>>>>>>> 2) The webcam driver unregisters the video device which in turn
>>>>>>>>> calls cdev_del().
>>>>>>>>>
>>>>>>>>> 3) When the last application using the device is closed, the
>>>>>>>>> cdev is released when the kref of the cdev's kobject goes to 0.
>>>>>>>>>
>>>>>>>>> 4) If the kref's release() call takes a while due to e.g. extra
>>>>>>>>> cleanup in the case of a webcam, then another application can
>>>>>>>>> try to open the video device. Note that this requires a device
>>>>>>>>> node created with mknod, otherwise the device nodes would
>>>>>>>>> already have been removed by udev.
>>>>>>>>>
>>>>>>>>> 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
>>>>>>>>> device node was never accessed before), then all is fine since
>>>>>>>>> kobj_lookup will fail because cdev_del() has been called
>>>>>>>>> earlier. However, if this device node was used earlier, then
>>>>>>>>> the else part is called: cdev_get(p). This 'p' is the cdev that
>>>>>>>>> is being released. Since the kref count is 0 you will get a
>>>>>>>>> WARN message from kref_get, but the code continues on, the
>>>>>>>>> f_op->open will (hopefully) return more-or-less gracefully with
>>>>>>>>> an error and the cdev_put at the end will cause the refcount to
>>>>>>>>> go to 0 again, which results in a SECOND call to the kref's
>>>>>>>>> release function!
>>>>>>>>>
>>>>>>>>> See this link for the original discussion on the v4l list
>>>>>>>>> containing stack traces an a patch that you need if you want to
>>>>>>>>> (and can) test this with the uvc driver:
>>>>>>>>>
>>>>>>>>> http://www.spinics.net/lists/vfl/msg39967.html
>>>>>>>> The second sentence in that message shows your problem here:
>>>>>>>> 	To avoid the need of a reference count in every v4l2 driver,
>>>>>>>> 	v4l2 moved to cdev which includes its own reference counting
>>>>>>>> 	infrastructure based on kobject.
>>>>>>>>
>>>>>>>> cdev is not ment to handle the reference counting of any object
>>>>>>>> outside of itself, and should never be embedded within anything.
>>>>>>>> I've been thinking of taking the real "kobject" out of that
>>>>>>>> structure for a long time now, incase someone did something
>>>>>>>> foolish like this.
>>>>>>>>
>>>>>>>> Seems I was too late :(
>>>>>>>>
>>>>>>>> So, to solve this, just remove the reliance on struct cdev in
>>>>>>>> your own structures, you don't want to do this for the very
>>>>>>>> reason you have now found (and for others, like the fact that
>>>>>>>> this isn't a "real" struct kobject in play here, just a fake
>>>>>>>> one.)
>>>>>>>>
>>>>>>>> Ick, what a mess.
>>>>>>> Sorry, but this makes no sense. First of all the race condition
>>>>>>> exists regardless of how v4l uses it. Other drivers using cdev with
>>>>>>> a hot-pluggable device in combination with a manually created
>>>>>>> device node should show the same problem.
>>>>>> I don't see how this would be, unless this problem has always
>>>>>> existed in the cdev code, it was created back before dynamic device
>>>>>> nodes with udev existed :)
>>>>>>
>>>>>>> It's just that we found it with v4l because the release callback
>>>>>>> takes longer than usual, thus increasing the chances of hitting the
>>>>>>> race.
>>>>>> The release callback for the cdev itself?
>>>>> Yes, although we override the release kobj_type.
>>>> Hahaha, yeah, I'm really afraid to say it, but you deserve the oops if
>>>> you do that :)
>>>>
>>>>> I noticed that a patch to do this through a cdev function was
>>>>> recently posted (and possibly already merged).
>>>> No, it was rejected as it was not needed.
>>>>
>>>>>>> The core problem is simply that it is possible to call cdev_get
>>>>>>> while in cdev_put! That should never happen.
>>>>>> True, and cdev_put is only called from __fput() which has the proper
>>>>>> locking to handle the call to cdev_get() if a char device is opened,
>>>>>> right?
>>>>> cdev_put is also called through cdev_del, and that's where the
>>>>> problem resides. The cdev_del() call has no locking to prevent a call
>>>>> from cdev_get(), and it can be called asynchronously as well in
>>>>> response to a USB disconnect.
>>>>>
>>>>>>> Secondly, why shouldn't struct cdev be embedded in anything? It's
>>>>>>> used in lots of drivers that way. I really don't see what's unusual
>>>>>>> or messy about v4l in that respect.
>>>>>> I don't see it embedded in any other structures at the moment, and
>>>>>> used to reference count the structure, do you have pointers to where
>>>>>> it is?
>>>>> Hmm, I took a closer look and it looks like v4l is indeed the first
>>>>> to use cdev for refcounting. Others either don't need it or do their
>>>>> own refcounting. It's definitely embedded in several structs, though.
>>>>> (grep for cdev_init)
>>>> Yes, embedding it is fine, just don't use it for reference counting as
>>>> it is using it's own reference counting for something other than what
>>>> you want to use it for.
>>>>
>>>> Overriding the release function should have given you the hint that
>>>> this is not the structure that you should be using for this kind of
>>>> operation.
>>>>
>>>>> However, I don't see how any amount of refcounting in drivers can
>>>>> prevent this race. At some point cdev_del() is called and (if nobody
>>>>> is using the chardev) cdev's kref goes to 0 triggering the release,
>>>>> and at that moment chrdev_open() can be called and the driver has no
>>>>> chance to prevent that, only cdev can do that.
>>>> Again, don't use cdev's reference counting for your own object
>>>> lifecycle, it is different and will cause problems, like you have
>>>> found out.
>>> Sigh. It has nothing to do with how v4l uses it. And to demonstrate
>>> this, here is how you reproduce it with the sg module (tested it with
>>> my USB harddisk).
>>>
>>> 1) apply this patch to char_dev.c:
>>>
>>> --- char_dev.c.orig	2008-12-17 13:30:22.000000000 +0100
>>> +++ char_dev.c	2008-12-17 08:50:23.000000000 +0100
>>> @@ -412,6 +412,9 @@
>>>
>>>  static void cdev_purge(struct cdev *cdev)
>>>  {
>>> +	printk("delayin\n");
>>> +	mdelay(2000);
>>> +	printk("delayout\n");
>>>  	spin_lock(&cdev_lock);
>>>  	while (!list_empty(&cdev->list)) {
>>>  		struct inode *inode;
>>>
>>>
>>> 2) connect your harddisk and make note of the sgX device that's
>>> created. 3) use mknod to create an sgX device manually for that
>>> harddisk. 4) open and close it (cat ./sgX >/dev/null followed by ctrl-C
>>> is sufficient) 5) disconnect the harddisk
>>> 6) when 'delayin' appears in the log, run cat ./sgX >/dev/null again
>>> 7) sit back and enjoy this lovely race condition:
>>>
>>> usb-storage: device found at 6
>>> usb-storage: waiting for device to settle before scanning
>>> scsi 6:0:0:0: Direct-Access     WD       5000BEV External 1.05 PQ: 0
>>> ANSI: 4 sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108
>>> MB) sd 6:0:0:0: [sdg] Write Protect is off
>>> sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
>>> sd 6:0:0:0: [sdg] Assuming drive cache: write through
>>> sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
>>> sd 6:0:0:0: [sdg] Write Protect is off
>>> sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
>>> sd 6:0:0:0: [sdg] Assuming drive cache: write through
>>> sdg: sdg1
>>> sd 6:0:0:0: [sdg] Attached SCSI disk
>>> sd 6:0:0:0: Attached scsi generic sg8 type 0
>>> usb-storage: device scan complete
>>> usb 2-4.1: USB disconnect, address 6
>>> delayin
>>> ------------[ cut here ]------------
>>> WARNING: at lib/kref.c:43 kref_get+0x17/0x1c()
>>> Modules linked in: sg
>>> Pid: 1491, comm: cat Not tainted 2.6.26-durdane-test #3
>>> [<c0134ad4>] warn_on_slowpath+0x40/0x79
>>> [<c0160ff8>] __do_fault+0x2e4/0x334
>>> [<c017e0c7>] __d_lookup+0xce/0x115
>>> [<c0175393>] do_lookup+0x53/0x145
>>> [<c017d523>] dput+0x16/0xdc
>>> [<c0176d53>] __link_path_walk+0xb36/0xc30
>>> [<c0181d44>] mntput_no_expire+0x18/0xef
>>> [<c02225fb>] kref_get+0x17/0x1c
>>> [<c0221be2>] kobject_get+0xf/0x13
>>> [<c01710b1>] cdev_get+0x55/0x69
>>> [<c01712ce>] chrdev_open+0xc2/0x184
>>> [<c017120c>] chrdev_open+0x0/0x184
>>> [<c016dd3b>] __dentry_open+0x127/0x210
>>> [<c016dea1>] nameidata_to_filp+0x1c/0x2c
>>> [<c01780c4>] do_filp_open+0x33d/0x6d5
>>> [<c016daee>] get_unused_fd_flags+0xb8/0xc2
>>> [<c016db35>] do_sys_open+0x3d/0xb6
>>> [<c016dbf2>] sys_open+0x1e/0x23
>>> [<c01188a9>] sysenter_past_esp+0x6a/0x91
>>> =======================
>>> ---[ end trace c8225d49e3c03b85 ]---
>>> delayin
>>> delayout
>>> delayout

OK Now I do not understand something. How is the double release possible?
I guess it is something to do with the complicated chrdev_open() in its
inode->i_cdev == NULL case. But still isn't the kref inside kobject 
suppose to protect me from exactly that?

>>>
>>> Note the duplicate 'delayin' messages. Also note that the cdev struct
>>> was allocated by sg.c, so the second cdev cleanup will likely poke into
>>> already freed memory.
>>>
>>> I tried to reproduce it as well without the patched char_dev.c by using
>>> this little program:
>>>
>>> #include <stdio.h>
>>> #include <unistd.h>
>>> #include <fcntl.h>
>>>
>>> main()
>>> {
>>>         int state = 0;
>>>
>>>         while (1) {
>>>                 int fh = open("./sg8", O_RDONLY);
>>>                 if (fh >= 0) {
>>>                         if (state == 0) {
>>>                                 printf("opened\n");
>>>                                 state = 1;
>>>                         }
>>>                         close(fh);
>>>                 }
>>>                 else {
>>>                         if (state) {
>>>                                 printf("error\n");
>>>                                 state = 0;
>>>                         }
>>>                 }
>>>         }
>>> }
>>>
>>> Just run it and connect/disconnect the USB harddisk.
>>>
>>> Unfortunately, I wasn't able to reproduce it because I always hit a bug
>>> in khelper first after only a few attempts:
>>>
>>> sd 13:0:0:0: [sdg] Assuming drive cache: write through
>>> sdg: sdg1
>>> sd 13:0:0:0: [sdg] Attached SCSI disk
>>> sd 13:0:0:0: Attached scsi generic sg8 type 0
>>> usb-storage: device scan complete
>>> usb 2-4.1: USB disconnect, address 13
>>> VFS: Close: file count is 0
>>> BUG: unable to handle kernel NULL pointer dereference at 00000004
>>> IP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25
>>> *pdpt = 000000002ebb5001 *pde = 0000000000000000
>>> Oops: 0002 [#1] PREEMPT SMP
>>> Modules linked in: nvidia(P)
>>>
>>> Pid: 13662, comm: khelper Tainted: P          (2.6.26-durdane #1)
>>> EIP: 0060:[<c0409df2>] EFLAGS: 00010002 CPU: 1
>>> EIP is at _spin_lock_irqsave+0x13/0x25
>>> EAX: 00000004 EBX: 00000000 ECX: 00000213 EDX: 00000100
>>> ESI: 00000004 EDI: 00000000 EBP: 00000000 ESP: eeabffc8
>>> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
>>> Process khelper (pid: 13662, ti=eeabe000 task=eeb280c0
>>> task.ti=eeabe000) Stack: c012e788 ee8fb300 00000000 00000000 c014144c
>>> 00000000 c01413ec c01194ab
>>> ee8fb300 00000000 00000000 00000000 00000000 00000000
>>> Call Trace:
>>> [<c012e788>] complete+0xf/0x36
>>> [<c014144c>] wait_for_helper+0x60/0x65
>>> [<c01413ec>] wait_for_helper+0x0/0x65
>>> [<c01194ab>] kernel_thread_helper+0x7/0x10
>>> =======================
>>> Code: 90 89 e2 81 e2 00 e0 ff ff ff 42 14 f0 83 28 01 79 05 e8 ae ff ff
>>> ff c3 9c 59 fa 89 e2 81 e2 00 e0 ff ff ff 42 14 ba 00 01 00 00 <f0> 66
>>> 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 c3 fa 89 e2
>>> EIP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25 SS:ESP 0068:eeabffc8
>>> ---[ end trace f5d38a13beb6dc77 ]---
>>> note: khelper[13662] exited with preempt_count 1
>>>
>>> I tried the same on a 2.6.28-rc8 kernel and got the same khelper error.
>>> Lovely.
>>>
>>> So, my conclusion remains that cdev has a race condition due to
>>> insufficient locking between cdev_del/put and cdev_get. And that
>>> khelper has it's own problems as well :-(
>>>
>>> Regards,
>>>
>>> 	Hans
>> Hi Hans
>>
>> Just as a far shot could you try this patch:
>>
>> [PATCH RESEND] char_dev: add cdev->release() and convert cdev_alloc() to
>> use it http://www.spinics.net/lists/kernel/msg808783.html
>>
>> Your goals look related. Also see the lkml thread.
>>
>> Please cc me on the matter.
> 
> Hi Boaz,
> 
> It will not fix this particular race condition, but it would make it easier 
> to hook the v4l release callback into cdev. It's a bit of a hack at the 
> moment. Greg clearly doesn't like drivers doing things like this, but it's 
> still not clear to me why not. As far as I can tell it is the only reliable 
> place where you can be 100% certain that it is safe to do the cleanup since 
> no one else is or can use the char device ever again.
> 
> Regards,
> 
> 	Hans
> 

I had a similar problem and solved it in a way that I think should be safe.
I'll try and run your tests to make sure. Here is a short description of my
solution:

MyDevice {
   ...
   embed_cdev;
   embed_kref;
   ...
};

OnHotPlug() {
	...
	kref_init(embed_kref); // kref==1
	cdev_add(embed_cdev);  // cdev==1
	get_(embed_cdev);      // cdev++==2
}

OnFileOpen() { // kernel does // cdev++==n > 2
	kref_get(embed_kref); // kref++==n > 1
}

OnFileClose() {
	kref_put(embed_kref, __release); // kref++==n >= 0
} // kernel does // cdev--==n >= 1

OnHotRemove() {
	cdev_del(embed_cdev);  // cdev--==1
	kref_put(embed_kref, __release); // kref--==n >= 0
}

__release() { // by definition kref==0
	put_(embed_cdev);      // cdev--==0
}

What do you think?

BTW sg does not use kref and I suspect it might be racy by its own.

Boaz

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 16:09                 ` Boaz Harrosh
@ 2008-12-17 17:33                     ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 17:33 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Greg KH, linux-kernel, v4l, Laurent Pinchart, Tejun Heo

On Wednesday 17 December 2008 17:09:27 Boaz Harrosh wrote:
> Hans Verkuil wrote:
> > On Wednesday 17 December 2008 15:52:38 Boaz Harrosh wrote:
> >> Hans Verkuil wrote:
> >>> On Wednesday 17 December 2008 00:30:39 Greg KH wrote:
> >>>> On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
> >>>>> On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
> >>>>>> On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> >>>>>>> On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> >>>>>>>> On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> >>>>>>>>> Hi Greg,
> >>>>>>>>>
> >>>>>>>>> Laurent found a race condition in the uvc driver that we traced
> >>>>>>>>> to the way chrdev_open and cdev_put/get work.
> >>>>>>>>>
> >>>>>>>>> You need the following ingredients to reproduce it:
> >>>>>>>>>
> >>>>>>>>> 1) a hot-pluggable char device like an USB webcam.
> >>>>>>>>> 2) a manually created device node for such a webcam instead of
> >>>>>>>>> relying on udev.
> >>>>>>>>>
> >>>>>>>>> In order to easily force this situation you would also need to
> >>>>>>>>> add a delay to the char device's release() function. For
> >>>>>>>>> webcams that would be at the top of v4l2_chardev_release() in
> >>>>>>>>> drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
> >>>>>>>>> cdev_purge would have the same effect.
> >>>>>>>>>
> >>>>>>>>> The sequence of events in the case of a webcam is as follows:
> >>>>>>>>>
> >>>>>>>>> 1) The USB device is removed, causing a disconnect.
> >>>>>>>>>
> >>>>>>>>> 2) The webcam driver unregisters the video device which in turn
> >>>>>>>>> calls cdev_del().
> >>>>>>>>>
> >>>>>>>>> 3) When the last application using the device is closed, the
> >>>>>>>>> cdev is released when the kref of the cdev's kobject goes to 0.
> >>>>>>>>>
> >>>>>>>>> 4) If the kref's release() call takes a while due to e.g. extra
> >>>>>>>>> cleanup in the case of a webcam, then another application can
> >>>>>>>>> try to open the video device. Note that this requires a device
> >>>>>>>>> node created with mknod, otherwise the device nodes would
> >>>>>>>>> already have been removed by udev.
> >>>>>>>>>
> >>>>>>>>> 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> >>>>>>>>> device node was never accessed before), then all is fine since
> >>>>>>>>> kobj_lookup will fail because cdev_del() has been called
> >>>>>>>>> earlier. However, if this device node was used earlier, then
> >>>>>>>>> the else part is called: cdev_get(p). This 'p' is the cdev that
> >>>>>>>>> is being released. Since the kref count is 0 you will get a
> >>>>>>>>> WARN message from kref_get, but the code continues on, the
> >>>>>>>>> f_op->open will (hopefully) return more-or-less gracefully with
> >>>>>>>>> an error and the cdev_put at the end will cause the refcount to
> >>>>>>>>> go to 0 again, which results in a SECOND call to the kref's
> >>>>>>>>> release function!
> >>>>>>>>>
> >>>>>>>>> See this link for the original discussion on the v4l list
> >>>>>>>>> containing stack traces an a patch that you need if you want to
> >>>>>>>>> (and can) test this with the uvc driver:
> >>>>>>>>>
> >>>>>>>>> http://www.spinics.net/lists/vfl/msg39967.html
> >>>>>>>>
> >>>>>>>> The second sentence in that message shows your problem here:
> >>>>>>>> 	To avoid the need of a reference count in every v4l2 driver,
> >>>>>>>> 	v4l2 moved to cdev which includes its own reference counting
> >>>>>>>> 	infrastructure based on kobject.
> >>>>>>>>
> >>>>>>>> cdev is not ment to handle the reference counting of any object
> >>>>>>>> outside of itself, and should never be embedded within anything.
> >>>>>>>> I've been thinking of taking the real "kobject" out of that
> >>>>>>>> structure for a long time now, incase someone did something
> >>>>>>>> foolish like this.
> >>>>>>>>
> >>>>>>>> Seems I was too late :(
> >>>>>>>>
> >>>>>>>> So, to solve this, just remove the reliance on struct cdev in
> >>>>>>>> your own structures, you don't want to do this for the very
> >>>>>>>> reason you have now found (and for others, like the fact that
> >>>>>>>> this isn't a "real" struct kobject in play here, just a fake
> >>>>>>>> one.)
> >>>>>>>>
> >>>>>>>> Ick, what a mess.
> >>>>>>>
> >>>>>>> Sorry, but this makes no sense. First of all the race condition
> >>>>>>> exists regardless of how v4l uses it. Other drivers using cdev
> >>>>>>> with a hot-pluggable device in combination with a manually
> >>>>>>> created device node should show the same problem.
> >>>>>>
> >>>>>> I don't see how this would be, unless this problem has always
> >>>>>> existed in the cdev code, it was created back before dynamic
> >>>>>> device nodes with udev existed :)
> >>>>>>
> >>>>>>> It's just that we found it with v4l because the release callback
> >>>>>>> takes longer than usual, thus increasing the chances of hitting
> >>>>>>> the race.
> >>>>>>
> >>>>>> The release callback for the cdev itself?
> >>>>>
> >>>>> Yes, although we override the release kobj_type.
> >>>>
> >>>> Hahaha, yeah, I'm really afraid to say it, but you deserve the oops
> >>>> if you do that :)
> >>>>
> >>>>> I noticed that a patch to do this through a cdev function was
> >>>>> recently posted (and possibly already merged).
> >>>>
> >>>> No, it was rejected as it was not needed.
> >>>>
> >>>>>>> The core problem is simply that it is possible to call cdev_get
> >>>>>>> while in cdev_put! That should never happen.
> >>>>>>
> >>>>>> True, and cdev_put is only called from __fput() which has the
> >>>>>> proper locking to handle the call to cdev_get() if a char device
> >>>>>> is opened, right?
> >>>>>
> >>>>> cdev_put is also called through cdev_del, and that's where the
> >>>>> problem resides. The cdev_del() call has no locking to prevent a
> >>>>> call from cdev_get(), and it can be called asynchronously as well
> >>>>> in response to a USB disconnect.
> >>>>>
> >>>>>>> Secondly, why shouldn't struct cdev be embedded in anything? It's
> >>>>>>> used in lots of drivers that way. I really don't see what's
> >>>>>>> unusual or messy about v4l in that respect.
> >>>>>>
> >>>>>> I don't see it embedded in any other structures at the moment, and
> >>>>>> used to reference count the structure, do you have pointers to
> >>>>>> where it is?
> >>>>>
> >>>>> Hmm, I took a closer look and it looks like v4l is indeed the first
> >>>>> to use cdev for refcounting. Others either don't need it or do
> >>>>> their own refcounting. It's definitely embedded in several structs,
> >>>>> though. (grep for cdev_init)
> >>>>
> >>>> Yes, embedding it is fine, just don't use it for reference counting
> >>>> as it is using it's own reference counting for something other than
> >>>> what you want to use it for.
> >>>>
> >>>> Overriding the release function should have given you the hint that
> >>>> this is not the structure that you should be using for this kind of
> >>>> operation.
> >>>>
> >>>>> However, I don't see how any amount of refcounting in drivers can
> >>>>> prevent this race. At some point cdev_del() is called and (if
> >>>>> nobody is using the chardev) cdev's kref goes to 0 triggering the
> >>>>> release, and at that moment chrdev_open() can be called and the
> >>>>> driver has no chance to prevent that, only cdev can do that.
> >>>>
> >>>> Again, don't use cdev's reference counting for your own object
> >>>> lifecycle, it is different and will cause problems, like you have
> >>>> found out.
> >>>
> >>> Sigh. It has nothing to do with how v4l uses it. And to demonstrate
> >>> this, here is how you reproduce it with the sg module (tested it with
> >>> my USB harddisk).
> >>>
> >>> 1) apply this patch to char_dev.c:
> >>>
> >>> --- char_dev.c.orig	2008-12-17 13:30:22.000000000 +0100
> >>> +++ char_dev.c	2008-12-17 08:50:23.000000000 +0100
> >>> @@ -412,6 +412,9 @@
> >>>
> >>>  static void cdev_purge(struct cdev *cdev)
> >>>  {
> >>> +	printk("delayin\n");
> >>> +	mdelay(2000);
> >>> +	printk("delayout\n");
> >>>  	spin_lock(&cdev_lock);
> >>>  	while (!list_empty(&cdev->list)) {
> >>>  		struct inode *inode;
> >>>
> >>>
> >>> 2) connect your harddisk and make note of the sgX device that's
> >>> created. 3) use mknod to create an sgX device manually for that
> >>> harddisk. 4) open and close it (cat ./sgX >/dev/null followed by
> >>> ctrl-C is sufficient) 5) disconnect the harddisk
> >>> 6) when 'delayin' appears in the log, run cat ./sgX >/dev/null again
> >>> 7) sit back and enjoy this lovely race condition:
> >>>
> >>> usb-storage: device found at 6
> >>> usb-storage: waiting for device to settle before scanning
> >>> scsi 6:0:0:0: Direct-Access     WD       5000BEV External 1.05 PQ: 0
> >>> ANSI: 4 sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108
> >>> MB) sd 6:0:0:0: [sdg] Write Protect is off
> >>> sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
> >>> sd 6:0:0:0: [sdg] Assuming drive cache: write through
> >>> sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
> >>> sd 6:0:0:0: [sdg] Write Protect is off
> >>> sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
> >>> sd 6:0:0:0: [sdg] Assuming drive cache: write through
> >>> sdg: sdg1
> >>> sd 6:0:0:0: [sdg] Attached SCSI disk
> >>> sd 6:0:0:0: Attached scsi generic sg8 type 0
> >>> usb-storage: device scan complete
> >>> usb 2-4.1: USB disconnect, address 6
> >>> delayin
> >>> ------------[ cut here ]------------
> >>> WARNING: at lib/kref.c:43 kref_get+0x17/0x1c()
> >>> Modules linked in: sg
> >>> Pid: 1491, comm: cat Not tainted 2.6.26-durdane-test #3
> >>> [<c0134ad4>] warn_on_slowpath+0x40/0x79
> >>> [<c0160ff8>] __do_fault+0x2e4/0x334
> >>> [<c017e0c7>] __d_lookup+0xce/0x115
> >>> [<c0175393>] do_lookup+0x53/0x145
> >>> [<c017d523>] dput+0x16/0xdc
> >>> [<c0176d53>] __link_path_walk+0xb36/0xc30
> >>> [<c0181d44>] mntput_no_expire+0x18/0xef
> >>> [<c02225fb>] kref_get+0x17/0x1c
> >>> [<c0221be2>] kobject_get+0xf/0x13
> >>> [<c01710b1>] cdev_get+0x55/0x69
> >>> [<c01712ce>] chrdev_open+0xc2/0x184
> >>> [<c017120c>] chrdev_open+0x0/0x184
> >>> [<c016dd3b>] __dentry_open+0x127/0x210
> >>> [<c016dea1>] nameidata_to_filp+0x1c/0x2c
> >>> [<c01780c4>] do_filp_open+0x33d/0x6d5
> >>> [<c016daee>] get_unused_fd_flags+0xb8/0xc2
> >>> [<c016db35>] do_sys_open+0x3d/0xb6
> >>> [<c016dbf2>] sys_open+0x1e/0x23
> >>> [<c01188a9>] sysenter_past_esp+0x6a/0x91
> >>> =======================
> >>> ---[ end trace c8225d49e3c03b85 ]---
> >>> delayin
> >>> delayout
> >>> delayout
>
> OK Now I do not understand something. How is the double release possible?
> I guess it is something to do with the complicated chrdev_open() in its
> inode->i_cdev == NULL case. But still isn't the kref inside kobject
> suppose to protect me from exactly that?

Only if there is also proper locking to prevent a kref_get from being called 
when the release of a kref_put is in progress. And that's missing in cdev.

Typical scenario: a USB device is disconnected, the driver sees that no 
applications are using it, then it calls cdev_del. This calls cdev_put, the 
cdev's refcount goes to 0 and it will call release. BUT, at this moment an 
application can open the device *again* and chrdev_open() will reuse the 
i_cdev pointer and call cdev_get(p) on it, and this increases the refcount 
from 0 to 1 again. And later it calls cdev_put again and release is called 
a second time.

>
> >>> Note the duplicate 'delayin' messages. Also note that the cdev struct
> >>> was allocated by sg.c, so the second cdev cleanup will likely poke
> >>> into already freed memory.
> >>>
> >>> I tried to reproduce it as well without the patched char_dev.c by
> >>> using this little program:
> >>>
> >>> #include <stdio.h>
> >>> #include <unistd.h>
> >>> #include <fcntl.h>
> >>>
> >>> main()
> >>> {
> >>>         int state = 0;
> >>>
> >>>         while (1) {
> >>>                 int fh = open("./sg8", O_RDONLY);
> >>>                 if (fh >= 0) {
> >>>                         if (state == 0) {
> >>>                                 printf("opened\n");
> >>>                                 state = 1;
> >>>                         }
> >>>                         close(fh);
> >>>                 }
> >>>                 else {
> >>>                         if (state) {
> >>>                                 printf("error\n");
> >>>                                 state = 0;
> >>>                         }
> >>>                 }
> >>>         }
> >>> }
> >>>
> >>> Just run it and connect/disconnect the USB harddisk.
> >>>
> >>> Unfortunately, I wasn't able to reproduce it because I always hit a
> >>> bug in khelper first after only a few attempts:
> >>>
> >>> sd 13:0:0:0: [sdg] Assuming drive cache: write through
> >>> sdg: sdg1
> >>> sd 13:0:0:0: [sdg] Attached SCSI disk
> >>> sd 13:0:0:0: Attached scsi generic sg8 type 0
> >>> usb-storage: device scan complete
> >>> usb 2-4.1: USB disconnect, address 13
> >>> VFS: Close: file count is 0
> >>> BUG: unable to handle kernel NULL pointer dereference at 00000004
> >>> IP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25
> >>> *pdpt = 000000002ebb5001 *pde = 0000000000000000
> >>> Oops: 0002 [#1] PREEMPT SMP
> >>> Modules linked in: nvidia(P)
> >>>
> >>> Pid: 13662, comm: khelper Tainted: P          (2.6.26-durdane #1)
> >>> EIP: 0060:[<c0409df2>] EFLAGS: 00010002 CPU: 1
> >>> EIP is at _spin_lock_irqsave+0x13/0x25
> >>> EAX: 00000004 EBX: 00000000 ECX: 00000213 EDX: 00000100
> >>> ESI: 00000004 EDI: 00000000 EBP: 00000000 ESP: eeabffc8
> >>> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> >>> Process khelper (pid: 13662, ti=eeabe000 task=eeb280c0
> >>> task.ti=eeabe000) Stack: c012e788 ee8fb300 00000000 00000000 c014144c
> >>> 00000000 c01413ec c01194ab
> >>> ee8fb300 00000000 00000000 00000000 00000000 00000000
> >>> Call Trace:
> >>> [<c012e788>] complete+0xf/0x36
> >>> [<c014144c>] wait_for_helper+0x60/0x65
> >>> [<c01413ec>] wait_for_helper+0x0/0x65
> >>> [<c01194ab>] kernel_thread_helper+0x7/0x10
> >>> =======================
> >>> Code: 90 89 e2 81 e2 00 e0 ff ff ff 42 14 f0 83 28 01 79 05 e8 ae ff
> >>> ff ff c3 9c 59 fa 89 e2 81 e2 00 e0 ff ff ff 42 14 ba 00 01 00 00
> >>> <f0> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 c3 fa 89 e2
> >>> EIP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25 SS:ESP 0068:eeabffc8
> >>> ---[ end trace f5d38a13beb6dc77 ]---
> >>> note: khelper[13662] exited with preempt_count 1
> >>>
> >>> I tried the same on a 2.6.28-rc8 kernel and got the same khelper
> >>> error. Lovely.
> >>>
> >>> So, my conclusion remains that cdev has a race condition due to
> >>> insufficient locking between cdev_del/put and cdev_get. And that
> >>> khelper has it's own problems as well :-(
> >>>
> >>> Regards,
> >>>
> >>> 	Hans
> >>
> >> Hi Hans
> >>
> >> Just as a far shot could you try this patch:
> >>
> >> [PATCH RESEND] char_dev: add cdev->release() and convert cdev_alloc()
> >> to use it http://www.spinics.net/lists/kernel/msg808783.html
> >>
> >> Your goals look related. Also see the lkml thread.
> >>
> >> Please cc me on the matter.
> >
> > Hi Boaz,
> >
> > It will not fix this particular race condition, but it would make it
> > easier to hook the v4l release callback into cdev. It's a bit of a hack
> > at the moment. Greg clearly doesn't like drivers doing things like
> > this, but it's still not clear to me why not. As far as I can tell it
> > is the only reliable place where you can be 100% certain that it is
> > safe to do the cleanup since no one else is or can use the char device
> > ever again.
> >
> > Regards,
> >
> > 	Hans
>
> I had a similar problem and solved it in a way that I think should be
> safe. I'll try and run your tests to make sure. Here is a short
> description of my solution:
>
> MyDevice {
>    ...
>    embed_cdev;
>    embed_kref;
>    ...
> };
>
> OnHotPlug() {
> 	...
> 	kref_init(embed_kref); // kref==1
> 	cdev_add(embed_cdev);  // cdev==1
> 	get_(embed_cdev);      // cdev++==2
> }
>
> OnFileOpen() { // kernel does // cdev++==n > 2
> 	kref_get(embed_kref); // kref++==n > 1
> }
>
> OnFileClose() {
> 	kref_put(embed_kref, __release); // kref++==n >= 0
> } // kernel does // cdev--==n >= 1
>
> OnHotRemove() {
> 	cdev_del(embed_cdev);  // cdev--==1
> 	kref_put(embed_kref, __release); // kref--==n >= 0
> }
>
> __release() { // by definition kref==0
> 	put_(embed_cdev);      // cdev--==0
> }
>
> What do you think?

This won't help either. __release calls cdev_put, the cdev refcount goes to 
0, the cdev release is called, but at this time someone can open the device 
again. Refcounting in the driver simply won't help since chrdev_open is 
always called before the driver has a chance to check anything.

In addition, I think it is nuts to introduce a kref that just shadows the 
cdev's kref. We should be able to rely on cdev for this.

I expect that when you test your driver with this you should hit the same 
race condition. Remember that you need to mknod the device. If you rely on 
udev then this will never happen because udev has removed the device node 
before cdev_del is called. (At least, I think that is always true. I'm no 
expert on this.)

> BTW sg does not use kref and I suspect it might be racy by its own.

It might, but it has nothing to do with this bug.

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
@ 2008-12-17 17:33                     ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 17:33 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Tejun Heo, Greg KH, v4l, linux-kernel

On Wednesday 17 December 2008 17:09:27 Boaz Harrosh wrote:
> Hans Verkuil wrote:
> > On Wednesday 17 December 2008 15:52:38 Boaz Harrosh wrote:
> >> Hans Verkuil wrote:
> >>> On Wednesday 17 December 2008 00:30:39 Greg KH wrote:
> >>>> On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
> >>>>> On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
> >>>>>> On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> >>>>>>> On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> >>>>>>>> On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> >>>>>>>>> Hi Greg,
> >>>>>>>>>
> >>>>>>>>> Laurent found a race condition in the uvc driver that we traced
> >>>>>>>>> to the way chrdev_open and cdev_put/get work.
> >>>>>>>>>
> >>>>>>>>> You need the following ingredients to reproduce it:
> >>>>>>>>>
> >>>>>>>>> 1) a hot-pluggable char device like an USB webcam.
> >>>>>>>>> 2) a manually created device node for such a webcam instead of
> >>>>>>>>> relying on udev.
> >>>>>>>>>
> >>>>>>>>> In order to easily force this situation you would also need to
> >>>>>>>>> add a delay to the char device's release() function. For
> >>>>>>>>> webcams that would be at the top of v4l2_chardev_release() in
> >>>>>>>>> drivers/media/video/v4l2-dev.c. But adding a delay to e.g.
> >>>>>>>>> cdev_purge would have the same effect.
> >>>>>>>>>
> >>>>>>>>> The sequence of events in the case of a webcam is as follows:
> >>>>>>>>>
> >>>>>>>>> 1) The USB device is removed, causing a disconnect.
> >>>>>>>>>
> >>>>>>>>> 2) The webcam driver unregisters the video device which in turn
> >>>>>>>>> calls cdev_del().
> >>>>>>>>>
> >>>>>>>>> 3) When the last application using the device is closed, the
> >>>>>>>>> cdev is released when the kref of the cdev's kobject goes to 0.
> >>>>>>>>>
> >>>>>>>>> 4) If the kref's release() call takes a while due to e.g. extra
> >>>>>>>>> cleanup in the case of a webcam, then another application can
> >>>>>>>>> try to open the video device. Note that this requires a device
> >>>>>>>>> node created with mknod, otherwise the device nodes would
> >>>>>>>>> already have been removed by udev.
> >>>>>>>>>
> >>>>>>>>> 5) chrdev_open checks inode->i_cdev. If this is NULL (i.e. this
> >>>>>>>>> device node was never accessed before), then all is fine since
> >>>>>>>>> kobj_lookup will fail because cdev_del() has been called
> >>>>>>>>> earlier. However, if this device node was used earlier, then
> >>>>>>>>> the else part is called: cdev_get(p). This 'p' is the cdev that
> >>>>>>>>> is being released. Since the kref count is 0 you will get a
> >>>>>>>>> WARN message from kref_get, but the code continues on, the
> >>>>>>>>> f_op->open will (hopefully) return more-or-less gracefully with
> >>>>>>>>> an error and the cdev_put at the end will cause the refcount to
> >>>>>>>>> go to 0 again, which results in a SECOND call to the kref's
> >>>>>>>>> release function!
> >>>>>>>>>
> >>>>>>>>> See this link for the original discussion on the v4l list
> >>>>>>>>> containing stack traces an a patch that you need if you want to
> >>>>>>>>> (and can) test this with the uvc driver:
> >>>>>>>>>
> >>>>>>>>> http://www.spinics.net/lists/vfl/msg39967.html
> >>>>>>>>
> >>>>>>>> The second sentence in that message shows your problem here:
> >>>>>>>> 	To avoid the need of a reference count in every v4l2 driver,
> >>>>>>>> 	v4l2 moved to cdev which includes its own reference counting
> >>>>>>>> 	infrastructure based on kobject.
> >>>>>>>>
> >>>>>>>> cdev is not ment to handle the reference counting of any object
> >>>>>>>> outside of itself, and should never be embedded within anything.
> >>>>>>>> I've been thinking of taking the real "kobject" out of that
> >>>>>>>> structure for a long time now, incase someone did something
> >>>>>>>> foolish like this.
> >>>>>>>>
> >>>>>>>> Seems I was too late :(
> >>>>>>>>
> >>>>>>>> So, to solve this, just remove the reliance on struct cdev in
> >>>>>>>> your own structures, you don't want to do this for the very
> >>>>>>>> reason you have now found (and for others, like the fact that
> >>>>>>>> this isn't a "real" struct kobject in play here, just a fake
> >>>>>>>> one.)
> >>>>>>>>
> >>>>>>>> Ick, what a mess.
> >>>>>>>
> >>>>>>> Sorry, but this makes no sense. First of all the race condition
> >>>>>>> exists regardless of how v4l uses it. Other drivers using cdev
> >>>>>>> with a hot-pluggable device in combination with a manually
> >>>>>>> created device node should show the same problem.
> >>>>>>
> >>>>>> I don't see how this would be, unless this problem has always
> >>>>>> existed in the cdev code, it was created back before dynamic
> >>>>>> device nodes with udev existed :)
> >>>>>>
> >>>>>>> It's just that we found it with v4l because the release callback
> >>>>>>> takes longer than usual, thus increasing the chances of hitting
> >>>>>>> the race.
> >>>>>>
> >>>>>> The release callback for the cdev itself?
> >>>>>
> >>>>> Yes, although we override the release kobj_type.
> >>>>
> >>>> Hahaha, yeah, I'm really afraid to say it, but you deserve the oops
> >>>> if you do that :)
> >>>>
> >>>>> I noticed that a patch to do this through a cdev function was
> >>>>> recently posted (and possibly already merged).
> >>>>
> >>>> No, it was rejected as it was not needed.
> >>>>
> >>>>>>> The core problem is simply that it is possible to call cdev_get
> >>>>>>> while in cdev_put! That should never happen.
> >>>>>>
> >>>>>> True, and cdev_put is only called from __fput() which has the
> >>>>>> proper locking to handle the call to cdev_get() if a char device
> >>>>>> is opened, right?
> >>>>>
> >>>>> cdev_put is also called through cdev_del, and that's where the
> >>>>> problem resides. The cdev_del() call has no locking to prevent a
> >>>>> call from cdev_get(), and it can be called asynchronously as well
> >>>>> in response to a USB disconnect.
> >>>>>
> >>>>>>> Secondly, why shouldn't struct cdev be embedded in anything? It's
> >>>>>>> used in lots of drivers that way. I really don't see what's
> >>>>>>> unusual or messy about v4l in that respect.
> >>>>>>
> >>>>>> I don't see it embedded in any other structures at the moment, and
> >>>>>> used to reference count the structure, do you have pointers to
> >>>>>> where it is?
> >>>>>
> >>>>> Hmm, I took a closer look and it looks like v4l is indeed the first
> >>>>> to use cdev for refcounting. Others either don't need it or do
> >>>>> their own refcounting. It's definitely embedded in several structs,
> >>>>> though. (grep for cdev_init)
> >>>>
> >>>> Yes, embedding it is fine, just don't use it for reference counting
> >>>> as it is using it's own reference counting for something other than
> >>>> what you want to use it for.
> >>>>
> >>>> Overriding the release function should have given you the hint that
> >>>> this is not the structure that you should be using for this kind of
> >>>> operation.
> >>>>
> >>>>> However, I don't see how any amount of refcounting in drivers can
> >>>>> prevent this race. At some point cdev_del() is called and (if
> >>>>> nobody is using the chardev) cdev's kref goes to 0 triggering the
> >>>>> release, and at that moment chrdev_open() can be called and the
> >>>>> driver has no chance to prevent that, only cdev can do that.
> >>>>
> >>>> Again, don't use cdev's reference counting for your own object
> >>>> lifecycle, it is different and will cause problems, like you have
> >>>> found out.
> >>>
> >>> Sigh. It has nothing to do with how v4l uses it. And to demonstrate
> >>> this, here is how you reproduce it with the sg module (tested it with
> >>> my USB harddisk).
> >>>
> >>> 1) apply this patch to char_dev.c:
> >>>
> >>> --- char_dev.c.orig	2008-12-17 13:30:22.000000000 +0100
> >>> +++ char_dev.c	2008-12-17 08:50:23.000000000 +0100
> >>> @@ -412,6 +412,9 @@
> >>>
> >>>  static void cdev_purge(struct cdev *cdev)
> >>>  {
> >>> +	printk("delayin\n");
> >>> +	mdelay(2000);
> >>> +	printk("delayout\n");
> >>>  	spin_lock(&cdev_lock);
> >>>  	while (!list_empty(&cdev->list)) {
> >>>  		struct inode *inode;
> >>>
> >>>
> >>> 2) connect your harddisk and make note of the sgX device that's
> >>> created. 3) use mknod to create an sgX device manually for that
> >>> harddisk. 4) open and close it (cat ./sgX >/dev/null followed by
> >>> ctrl-C is sufficient) 5) disconnect the harddisk
> >>> 6) when 'delayin' appears in the log, run cat ./sgX >/dev/null again
> >>> 7) sit back and enjoy this lovely race condition:
> >>>
> >>> usb-storage: device found at 6
> >>> usb-storage: waiting for device to settle before scanning
> >>> scsi 6:0:0:0: Direct-Access     WD       5000BEV External 1.05 PQ: 0
> >>> ANSI: 4 sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108
> >>> MB) sd 6:0:0:0: [sdg] Write Protect is off
> >>> sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
> >>> sd 6:0:0:0: [sdg] Assuming drive cache: write through
> >>> sd 6:0:0:0: [sdg] 976773168 512-byte hardware sectors (500108 MB)
> >>> sd 6:0:0:0: [sdg] Write Protect is off
> >>> sd 6:0:0:0: [sdg] Mode Sense: 21 00 00 00
> >>> sd 6:0:0:0: [sdg] Assuming drive cache: write through
> >>> sdg: sdg1
> >>> sd 6:0:0:0: [sdg] Attached SCSI disk
> >>> sd 6:0:0:0: Attached scsi generic sg8 type 0
> >>> usb-storage: device scan complete
> >>> usb 2-4.1: USB disconnect, address 6
> >>> delayin
> >>> ------------[ cut here ]------------
> >>> WARNING: at lib/kref.c:43 kref_get+0x17/0x1c()
> >>> Modules linked in: sg
> >>> Pid: 1491, comm: cat Not tainted 2.6.26-durdane-test #3
> >>> [<c0134ad4>] warn_on_slowpath+0x40/0x79
> >>> [<c0160ff8>] __do_fault+0x2e4/0x334
> >>> [<c017e0c7>] __d_lookup+0xce/0x115
> >>> [<c0175393>] do_lookup+0x53/0x145
> >>> [<c017d523>] dput+0x16/0xdc
> >>> [<c0176d53>] __link_path_walk+0xb36/0xc30
> >>> [<c0181d44>] mntput_no_expire+0x18/0xef
> >>> [<c02225fb>] kref_get+0x17/0x1c
> >>> [<c0221be2>] kobject_get+0xf/0x13
> >>> [<c01710b1>] cdev_get+0x55/0x69
> >>> [<c01712ce>] chrdev_open+0xc2/0x184
> >>> [<c017120c>] chrdev_open+0x0/0x184
> >>> [<c016dd3b>] __dentry_open+0x127/0x210
> >>> [<c016dea1>] nameidata_to_filp+0x1c/0x2c
> >>> [<c01780c4>] do_filp_open+0x33d/0x6d5
> >>> [<c016daee>] get_unused_fd_flags+0xb8/0xc2
> >>> [<c016db35>] do_sys_open+0x3d/0xb6
> >>> [<c016dbf2>] sys_open+0x1e/0x23
> >>> [<c01188a9>] sysenter_past_esp+0x6a/0x91
> >>> =======================
> >>> ---[ end trace c8225d49e3c03b85 ]---
> >>> delayin
> >>> delayout
> >>> delayout
>
> OK Now I do not understand something. How is the double release possible?
> I guess it is something to do with the complicated chrdev_open() in its
> inode->i_cdev == NULL case. But still isn't the kref inside kobject
> suppose to protect me from exactly that?

Only if there is also proper locking to prevent a kref_get from being called 
when the release of a kref_put is in progress. And that's missing in cdev.

Typical scenario: a USB device is disconnected, the driver sees that no 
applications are using it, then it calls cdev_del. This calls cdev_put, the 
cdev's refcount goes to 0 and it will call release. BUT, at this moment an 
application can open the device *again* and chrdev_open() will reuse the 
i_cdev pointer and call cdev_get(p) on it, and this increases the refcount 
from 0 to 1 again. And later it calls cdev_put again and release is called 
a second time.

>
> >>> Note the duplicate 'delayin' messages. Also note that the cdev struct
> >>> was allocated by sg.c, so the second cdev cleanup will likely poke
> >>> into already freed memory.
> >>>
> >>> I tried to reproduce it as well without the patched char_dev.c by
> >>> using this little program:
> >>>
> >>> #include <stdio.h>
> >>> #include <unistd.h>
> >>> #include <fcntl.h>
> >>>
> >>> main()
> >>> {
> >>>         int state = 0;
> >>>
> >>>         while (1) {
> >>>                 int fh = open("./sg8", O_RDONLY);
> >>>                 if (fh >= 0) {
> >>>                         if (state == 0) {
> >>>                                 printf("opened\n");
> >>>                                 state = 1;
> >>>                         }
> >>>                         close(fh);
> >>>                 }
> >>>                 else {
> >>>                         if (state) {
> >>>                                 printf("error\n");
> >>>                                 state = 0;
> >>>                         }
> >>>                 }
> >>>         }
> >>> }
> >>>
> >>> Just run it and connect/disconnect the USB harddisk.
> >>>
> >>> Unfortunately, I wasn't able to reproduce it because I always hit a
> >>> bug in khelper first after only a few attempts:
> >>>
> >>> sd 13:0:0:0: [sdg] Assuming drive cache: write through
> >>> sdg: sdg1
> >>> sd 13:0:0:0: [sdg] Attached SCSI disk
> >>> sd 13:0:0:0: Attached scsi generic sg8 type 0
> >>> usb-storage: device scan complete
> >>> usb 2-4.1: USB disconnect, address 13
> >>> VFS: Close: file count is 0
> >>> BUG: unable to handle kernel NULL pointer dereference at 00000004
> >>> IP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25
> >>> *pdpt = 000000002ebb5001 *pde = 0000000000000000
> >>> Oops: 0002 [#1] PREEMPT SMP
> >>> Modules linked in: nvidia(P)
> >>>
> >>> Pid: 13662, comm: khelper Tainted: P          (2.6.26-durdane #1)
> >>> EIP: 0060:[<c0409df2>] EFLAGS: 00010002 CPU: 1
> >>> EIP is at _spin_lock_irqsave+0x13/0x25
> >>> EAX: 00000004 EBX: 00000000 ECX: 00000213 EDX: 00000100
> >>> ESI: 00000004 EDI: 00000000 EBP: 00000000 ESP: eeabffc8
> >>> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> >>> Process khelper (pid: 13662, ti=eeabe000 task=eeb280c0
> >>> task.ti=eeabe000) Stack: c012e788 ee8fb300 00000000 00000000 c014144c
> >>> 00000000 c01413ec c01194ab
> >>> ee8fb300 00000000 00000000 00000000 00000000 00000000
> >>> Call Trace:
> >>> [<c012e788>] complete+0xf/0x36
> >>> [<c014144c>] wait_for_helper+0x60/0x65
> >>> [<c01413ec>] wait_for_helper+0x0/0x65
> >>> [<c01194ab>] kernel_thread_helper+0x7/0x10
> >>> =======================
> >>> Code: 90 89 e2 81 e2 00 e0 ff ff ff 42 14 f0 83 28 01 79 05 e8 ae ff
> >>> ff ff c3 9c 59 fa 89 e2 81 e2 00 e0 ff ff ff 42 14 ba 00 01 00 00
> >>> <f0> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 c3 fa 89 e2
> >>> EIP: [<c0409df2>] _spin_lock_irqsave+0x13/0x25 SS:ESP 0068:eeabffc8
> >>> ---[ end trace f5d38a13beb6dc77 ]---
> >>> note: khelper[13662] exited with preempt_count 1
> >>>
> >>> I tried the same on a 2.6.28-rc8 kernel and got the same khelper
> >>> error. Lovely.
> >>>
> >>> So, my conclusion remains that cdev has a race condition due to
> >>> insufficient locking between cdev_del/put and cdev_get. And that
> >>> khelper has it's own problems as well :-(
> >>>
> >>> Regards,
> >>>
> >>> 	Hans
> >>
> >> Hi Hans
> >>
> >> Just as a far shot could you try this patch:
> >>
> >> [PATCH RESEND] char_dev: add cdev->release() and convert cdev_alloc()
> >> to use it http://www.spinics.net/lists/kernel/msg808783.html
> >>
> >> Your goals look related. Also see the lkml thread.
> >>
> >> Please cc me on the matter.
> >
> > Hi Boaz,
> >
> > It will not fix this particular race condition, but it would make it
> > easier to hook the v4l release callback into cdev. It's a bit of a hack
> > at the moment. Greg clearly doesn't like drivers doing things like
> > this, but it's still not clear to me why not. As far as I can tell it
> > is the only reliable place where you can be 100% certain that it is
> > safe to do the cleanup since no one else is or can use the char device
> > ever again.
> >
> > Regards,
> >
> > 	Hans
>
> I had a similar problem and solved it in a way that I think should be
> safe. I'll try and run your tests to make sure. Here is a short
> description of my solution:
>
> MyDevice {
>    ...
>    embed_cdev;
>    embed_kref;
>    ...
> };
>
> OnHotPlug() {
> 	...
> 	kref_init(embed_kref); // kref==1
> 	cdev_add(embed_cdev);  // cdev==1
> 	get_(embed_cdev);      // cdev++==2
> }
>
> OnFileOpen() { // kernel does // cdev++==n > 2
> 	kref_get(embed_kref); // kref++==n > 1
> }
>
> OnFileClose() {
> 	kref_put(embed_kref, __release); // kref++==n >= 0
> } // kernel does // cdev--==n >= 1
>
> OnHotRemove() {
> 	cdev_del(embed_cdev);  // cdev--==1
> 	kref_put(embed_kref, __release); // kref--==n >= 0
> }
>
> __release() { // by definition kref==0
> 	put_(embed_cdev);      // cdev--==0
> }
>
> What do you think?

This won't help either. __release calls cdev_put, the cdev refcount goes to 
0, the cdev release is called, but at this time someone can open the device 
again. Refcounting in the driver simply won't help since chrdev_open is 
always called before the driver has a chance to check anything.

In addition, I think it is nuts to introduce a kref that just shadows the 
cdev's kref. We should be able to rely on cdev for this.

I expect that when you test your driver with this you should hit the same 
race condition. Remember that you need to mknod the device. If you rely on 
udev then this will never happen because udev has removed the device node 
before cdev_del is called. (At least, I think that is always true. I'm no 
expert on this.)

> BTW sg does not use kref and I suspect it might be racy by its own.

It might, but it has nothing to do with this bug.

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 17:33                     ` Hans Verkuil
  (?)
@ 2008-12-17 18:08                     ` Al Viro
  2008-12-18  8:12                       ` Boaz Harrosh
  -1 siblings, 1 reply; 35+ messages in thread
From: Al Viro @ 2008-12-17 18:08 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Boaz Harrosh, Greg KH, linux-kernel, v4l, Laurent Pinchart, Tejun Heo

On Wed, Dec 17, 2008 at 06:33:38PM +0100, Hans Verkuil wrote:
> On Wednesday 17 December 2008 17:09:27 Boaz Harrosh wrote:
> > Hans Verkuil wrote:
> > > On Wednesday 17 December 2008 15:52:38 Boaz Harrosh wrote:
> > >> Hans Verkuil wrote:
> > >>> On Wednesday 17 December 2008 00:30:39 Greg KH wrote:
> > >>>> On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
> > >>>>> On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
> > >>>>>> On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
> > >>>>>>> On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
> > >>>>>>>> On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
> > >>>>>>>>> Hi Greg,

[snip 400-odd lines, with ~30 lines of original contents]

Could the esteemed sir kindly consider trimming the fucking crap or taking
that to alt.cascade?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 13:37             ` Hans Verkuil
  (?)
  (?)
@ 2008-12-17 18:16             ` Greg KH
  2008-12-17 19:27                 ` Laurent Pinchart
  2008-12-17 19:30                 ` Hans Verkuil
  -1 siblings, 2 replies; 35+ messages in thread
From: Greg KH @ 2008-12-17 18:16 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-kernel, v4l, Laurent Pinchart

On Wed, Dec 17, 2008 at 02:37:33PM +0100, Hans Verkuil wrote:
> > Again, don't use cdev's reference counting for your own object
> > lifecycle, it is different and will cause problems, like you have found
> > out.
> 
> Sigh. It has nothing to do with how v4l uses it. And to demonstrate this, 
> here is how you reproduce it with the sg module (tested it with my USB 
> harddisk).
> 
> 1) apply this patch to char_dev.c:

<snip>

Ok, since I can't convince you that using a cdev for your reference
counting is incorrect, I'll have to go change the cdev code to prevent
you from doing this :(

Anyway, do you have a patch for the cdev code to propose how to fix this
issue you are having?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 18:16             ` Greg KH
@ 2008-12-17 19:27                 ` Laurent Pinchart
  2008-12-17 19:30                 ` Hans Verkuil
  1 sibling, 0 replies; 35+ messages in thread
From: Laurent Pinchart @ 2008-12-17 19:27 UTC (permalink / raw)
  To: Greg KH; +Cc: Hans Verkuil, linux-kernel, v4l

Hi Greg,

On Wednesday 17 December 2008, Greg KH wrote:
> On Wed, Dec 17, 2008 at 02:37:33PM +0100, Hans Verkuil wrote:
> > > Again, don't use cdev's reference counting for your own object
> > > lifecycle, it is different and will cause problems, like you have found
> > > out.
> >
> > Sigh. It has nothing to do with how v4l uses it. And to demonstrate this,
> > here is how you reproduce it with the sg module (tested it with my USB
> > harddisk).
> >
> > 1) apply this patch to char_dev.c:
>
> <snip>
>
> Ok, since I can't convince you that using a cdev for your reference
> counting is incorrect, I'll have to go change the cdev code to prevent
> you from doing this :(

Don't give up yet :-)

As v4l isn't the only kernel subsystem wrongly using cdev (Hans showed that sg 
also suffered from race conditions), people seem not to understand cdev 
properly. Maybe you should start by explaining what cdev has been designed to 
handle and how to use it in device drivers (such as sg or v4l) instead of 
telling us what not to do.

> Anyway, do you have a patch for the cdev code to propose how to fix this
> issue you are having?

Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
@ 2008-12-17 19:27                 ` Laurent Pinchart
  0 siblings, 0 replies; 35+ messages in thread
From: Laurent Pinchart @ 2008-12-17 19:27 UTC (permalink / raw)
  To: Greg KH; +Cc: v4l, linux-kernel

Hi Greg,

On Wednesday 17 December 2008, Greg KH wrote:
> On Wed, Dec 17, 2008 at 02:37:33PM +0100, Hans Verkuil wrote:
> > > Again, don't use cdev's reference counting for your own object
> > > lifecycle, it is different and will cause problems, like you have found
> > > out.
> >
> > Sigh. It has nothing to do with how v4l uses it. And to demonstrate this,
> > here is how you reproduce it with the sg module (tested it with my USB
> > harddisk).
> >
> > 1) apply this patch to char_dev.c:
>
> <snip>
>
> Ok, since I can't convince you that using a cdev for your reference
> counting is incorrect, I'll have to go change the cdev code to prevent
> you from doing this :(

Don't give up yet :-)

As v4l isn't the only kernel subsystem wrongly using cdev (Hans showed that sg 
also suffered from race conditions), people seem not to understand cdev 
properly. Maybe you should start by explaining what cdev has been designed to 
handle and how to use it in device drivers (such as sg or v4l) instead of 
telling us what not to do.

> Anyway, do you have a patch for the cdev code to propose how to fix this
> issue you are having?

Regards,

Laurent Pinchart

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 18:16             ` Greg KH
@ 2008-12-17 19:30                 ` Hans Verkuil
  2008-12-17 19:30                 ` Hans Verkuil
  1 sibling, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 19:30 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-kernel, v4l, Laurent Pinchart

On Wednesday 17 December 2008 19:16:45 Greg KH wrote:
> On Wed, Dec 17, 2008 at 02:37:33PM +0100, Hans Verkuil wrote:
> > > Again, don't use cdev's reference counting for your own object
> > > lifecycle, it is different and will cause problems, like you have
> > > found out.
> >
> > Sigh. It has nothing to do with how v4l uses it. And to demonstrate
> > this, here is how you reproduce it with the sg module (tested it with
> > my USB harddisk).
> >
> > 1) apply this patch to char_dev.c:
>
> <snip>
>
> Ok, since I can't convince you that using a cdev for your reference
> counting is incorrect, I'll have to go change the cdev code to prevent
> you from doing this :(

Erm, you haven't told me yet why it's a bad idea. You just said "don't do 
it", but I haven't seen the reason for it. There doesn't seem to be any 
documentation on how to properly use cdev besides the Kernel Device Drivers 
book, which (if memory serves) doesn't mention anything on this topic.

I really don't mind implementing refcounting in the v4l framework, I just 
want to understand why it should be done like that!

It seems to me that I will just be shadowing the refcounting of cdev if I 
implement refcounting in v4l: init to 1 on creation, increase on open, 
decrease on close, decrease on deletion. It's all terribly familiar...

> Anyway, do you have a patch for the cdev code to propose how to fix this
> issue you are having?

Sure, here it is:

--- fs/char_dev.c.orig	2008-12-17 20:28:40.000000000 +0100
+++ fs/char_dev.c	2008-12-17 20:28:49.000000000 +0100
@@ -345,7 +345,9 @@
 {
 	if (p) {
 		struct module *owner = p->owner;
+		spin_lock(&cdev_lock);
 		kobject_put(&p->kobj);
+		spin_unlock(&cdev_lock);
 		module_put(owner);
 	}
 }
@@ -415,14 +417,12 @@
 
 static void cdev_purge(struct cdev *cdev)
 {
-	spin_lock(&cdev_lock);
 	while (!list_empty(&cdev->list)) {
 		struct inode *inode;
 		inode = container_of(cdev->list.next, struct inode, i_devices);
 		list_del_init(&inode->i_devices);
 		inode->i_cdev = NULL;
 	}
-	spin_unlock(&cdev_lock);
 }
 
 /*
@@ -478,7 +478,9 @@
 void cdev_del(struct cdev *p)
 {
 	cdev_unmap(p->dev, p->count);
+	spin_lock(&cdev_lock);
 	kobject_put(&p->kobj);
+	spin_unlock(&cdev_lock);
 }

This solves this particular problem. But this will certainly break v4l as it 
is right now, since the spin_lock means that the kref's release cannot do 
any sleeps, which is possible in v4l. If we want to allow that in cdev, 
then the spinlock has to be replaced by a mutex. But I have the strong 
feeling that that's not going to happen :-)

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
@ 2008-12-17 19:30                 ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 19:30 UTC (permalink / raw)
  To: Greg KH; +Cc: v4l, linux-kernel

On Wednesday 17 December 2008 19:16:45 Greg KH wrote:
> On Wed, Dec 17, 2008 at 02:37:33PM +0100, Hans Verkuil wrote:
> > > Again, don't use cdev's reference counting for your own object
> > > lifecycle, it is different and will cause problems, like you have
> > > found out.
> >
> > Sigh. It has nothing to do with how v4l uses it. And to demonstrate
> > this, here is how you reproduce it with the sg module (tested it with
> > my USB harddisk).
> >
> > 1) apply this patch to char_dev.c:
>
> <snip>
>
> Ok, since I can't convince you that using a cdev for your reference
> counting is incorrect, I'll have to go change the cdev code to prevent
> you from doing this :(

Erm, you haven't told me yet why it's a bad idea. You just said "don't do 
it", but I haven't seen the reason for it. There doesn't seem to be any 
documentation on how to properly use cdev besides the Kernel Device Drivers 
book, which (if memory serves) doesn't mention anything on this topic.

I really don't mind implementing refcounting in the v4l framework, I just 
want to understand why it should be done like that!

It seems to me that I will just be shadowing the refcounting of cdev if I 
implement refcounting in v4l: init to 1 on creation, increase on open, 
decrease on close, decrease on deletion. It's all terribly familiar...

> Anyway, do you have a patch for the cdev code to propose how to fix this
> issue you are having?

Sure, here it is:

--- fs/char_dev.c.orig	2008-12-17 20:28:40.000000000 +0100
+++ fs/char_dev.c	2008-12-17 20:28:49.000000000 +0100
@@ -345,7 +345,9 @@
 {
 	if (p) {
 		struct module *owner = p->owner;
+		spin_lock(&cdev_lock);
 		kobject_put(&p->kobj);
+		spin_unlock(&cdev_lock);
 		module_put(owner);
 	}
 }
@@ -415,14 +417,12 @@
 
 static void cdev_purge(struct cdev *cdev)
 {
-	spin_lock(&cdev_lock);
 	while (!list_empty(&cdev->list)) {
 		struct inode *inode;
 		inode = container_of(cdev->list.next, struct inode, i_devices);
 		list_del_init(&inode->i_devices);
 		inode->i_cdev = NULL;
 	}
-	spin_unlock(&cdev_lock);
 }
 
 /*
@@ -478,7 +478,9 @@
 void cdev_del(struct cdev *p)
 {
 	cdev_unmap(p->dev, p->count);
+	spin_lock(&cdev_lock);
 	kobject_put(&p->kobj);
+	spin_unlock(&cdev_lock);
 }

This solves this particular problem. But this will certainly break v4l as it 
is right now, since the spin_lock means that the kref's release cannot do 
any sleeps, which is possible in v4l. If we want to allow that in cdev, 
then the spinlock has to be replaced by a mutex. But I have the strong 
feeling that that's not going to happen :-)

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 19:27                 ` Laurent Pinchart
  (?)
@ 2008-12-17 19:35                 ` Greg KH
  -1 siblings, 0 replies; 35+ messages in thread
From: Greg KH @ 2008-12-17 19:35 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Hans Verkuil, linux-kernel, v4l

On Wed, Dec 17, 2008 at 08:27:13PM +0100, Laurent Pinchart wrote:
> Hi Greg,
> 
> On Wednesday 17 December 2008, Greg KH wrote:
> > On Wed, Dec 17, 2008 at 02:37:33PM +0100, Hans Verkuil wrote:
> > > > Again, don't use cdev's reference counting for your own object
> > > > lifecycle, it is different and will cause problems, like you have found
> > > > out.
> > >
> > > Sigh. It has nothing to do with how v4l uses it. And to demonstrate this,
> > > here is how you reproduce it with the sg module (tested it with my USB
> > > harddisk).
> > >
> > > 1) apply this patch to char_dev.c:
> >
> > <snip>
> >
> > Ok, since I can't convince you that using a cdev for your reference
> > counting is incorrect, I'll have to go change the cdev code to prevent
> > you from doing this :(
> 
> Don't give up yet :-)
> 
> As v4l isn't the only kernel subsystem wrongly using cdev (Hans showed that sg 
> also suffered from race conditions), people seem not to understand cdev 
> properly.

I totally agree, it is not the most easily understood chunk of code.

> Maybe you should start by explaining what cdev has been designed to
> handle and how to use it in device drivers (such as sg or v4l) instead
> of telling us what not to do.

As I didn't create this code, but for some reason seem to maintain it,
I'll work on creating a document to do this.  But as the holidays are
rapidly approaching, it might be a while before I can get to it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 19:30                 ` Hans Verkuil
  (?)
@ 2008-12-17 19:38                 ` Greg KH
  -1 siblings, 0 replies; 35+ messages in thread
From: Greg KH @ 2008-12-17 19:38 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-kernel, v4l, Laurent Pinchart

On Wed, Dec 17, 2008 at 08:30:32PM +0100, Hans Verkuil wrote:
> On Wednesday 17 December 2008 19:16:45 Greg KH wrote:
> > On Wed, Dec 17, 2008 at 02:37:33PM +0100, Hans Verkuil wrote:
> > > > Again, don't use cdev's reference counting for your own object
> > > > lifecycle, it is different and will cause problems, like you have
> > > > found out.
> > >
> > > Sigh. It has nothing to do with how v4l uses it. And to demonstrate
> > > this, here is how you reproduce it with the sg module (tested it with
> > > my USB harddisk).
> > >
> > > 1) apply this patch to char_dev.c:
> >
> > <snip>
> >
> > Ok, since I can't convince you that using a cdev for your reference
> > counting is incorrect, I'll have to go change the cdev code to prevent
> > you from doing this :(
> 
> Erm, you haven't told me yet why it's a bad idea. You just said "don't do 
> it", but I haven't seen the reason for it. There doesn't seem to be any 
> documentation on how to properly use cdev besides the Kernel Device Drivers 
> book, which (if memory serves) doesn't mention anything on this topic.

The reason is that it was not designed to do such a thing.  It was
designed to handle the character device lookup, and that's it.

> I really don't mind implementing refcounting in the v4l framework, I just 
> want to understand why it should be done like that!

Because the lifecycle of your device should be separate from the
lifecycle of the character device node, you should not tie the two
together.

> It seems to me that I will just be shadowing the refcounting of cdev if I 
> implement refcounting in v4l: init to 1 on creation, increase on open, 
> decrease on close, decrease on deletion. It's all terribly familiar...

You shouldn't, odds are there are other ways to grab the reference on
your device from outside of the character node, right?  Like the USB
probe/disconnect path?  So you should have an independant reference
count in order to handle all of the different ways the device can be
accessed.

> > Anyway, do you have a patch for the cdev code to propose how to fix this
> > issue you are having?
> 
> Sure, here it is:
> 
> --- fs/char_dev.c.orig	2008-12-17 20:28:40.000000000 +0100
> +++ fs/char_dev.c	2008-12-17 20:28:49.000000000 +0100
> @@ -345,7 +345,9 @@
>  {
>  	if (p) {
>  		struct module *owner = p->owner;
> +		spin_lock(&cdev_lock);
>  		kobject_put(&p->kobj);
> +		spin_unlock(&cdev_lock);
>  		module_put(owner);
>  	}
>  }
> @@ -415,14 +417,12 @@
>  
>  static void cdev_purge(struct cdev *cdev)
>  {
> -	spin_lock(&cdev_lock);
>  	while (!list_empty(&cdev->list)) {
>  		struct inode *inode;
>  		inode = container_of(cdev->list.next, struct inode, i_devices);
>  		list_del_init(&inode->i_devices);
>  		inode->i_cdev = NULL;
>  	}
> -	spin_unlock(&cdev_lock);
>  }
>  
>  /*
> @@ -478,7 +478,9 @@
>  void cdev_del(struct cdev *p)
>  {
>  	cdev_unmap(p->dev, p->count);
> +	spin_lock(&cdev_lock);
>  	kobject_put(&p->kobj);
> +	spin_unlock(&cdev_lock);
>  }
> 
> This solves this particular problem. But this will certainly break v4l as it 
> is right now, since the spin_lock means that the kref's release cannot do 
> any sleeps, which is possible in v4l. If we want to allow that in cdev, 
> then the spinlock has to be replaced by a mutex. But I have the strong 
> feeling that that's not going to happen :-)

True :)

I'll be glad to queue this change up for 2.6.29 if you are going to fix
up the v4l code to not break if I add this code.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 19:30                 ` Hans Verkuil
@ 2008-12-17 19:39                   ` Hans Verkuil
  -1 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 19:39 UTC (permalink / raw)
  To: video4linux-list; +Cc: Greg KH, linux-kernel

On Wednesday 17 December 2008 20:30:32 Hans Verkuil wrote:
>
> This solves this particular problem. But this will certainly break v4l as
> it is right now, since the spin_lock means that the kref's release cannot
> do any sleeps, which is possible in v4l. If we want to allow that in
> cdev, then the spinlock has to be replaced by a mutex. But I have the
> strong feeling that that's not going to happen :-)

Note that if we ever allow drivers to hook in their own release callback, 
then we certainly should switch to a mutex in the cdev struct, rather than 
a global mutex. It obviously makes life more complicated for cdev, but much 
easier for drivers.

Regards,

 	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
@ 2008-12-17 19:39                   ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 19:39 UTC (permalink / raw)
  To: video4linux-list; +Cc: Greg KH, linux-kernel

On Wednesday 17 December 2008 20:30:32 Hans Verkuil wrote:
>
> This solves this particular problem. But this will certainly break v4l as
> it is right now, since the spin_lock means that the kref's release cannot
> do any sleeps, which is possible in v4l. If we want to allow that in
> cdev, then the spinlock has to be replaced by a mutex. But I have the
> strong feeling that that's not going to happen :-)

Note that if we ever allow drivers to hook in their own release callback, 
then we certainly should switch to a mutex in the cdev struct, rather than 
a global mutex. It obviously makes life more complicated for cdev, but much 
easier for drivers.

Regards,

 	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 19:39                   ` Hans Verkuil
  (?)
@ 2008-12-17 19:53                   ` Greg KH
  2008-12-17 20:18                       ` Hans Verkuil
  -1 siblings, 1 reply; 35+ messages in thread
From: Greg KH @ 2008-12-17 19:53 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: video4linux-list, linux-kernel

On Wed, Dec 17, 2008 at 08:39:03PM +0100, Hans Verkuil wrote:
> On Wednesday 17 December 2008 20:30:32 Hans Verkuil wrote:
> >
> > This solves this particular problem. But this will certainly break v4l as
> > it is right now, since the spin_lock means that the kref's release cannot
> > do any sleeps, which is possible in v4l. If we want to allow that in
> > cdev, then the spinlock has to be replaced by a mutex. But I have the
> > strong feeling that that's not going to happen :-)
> 
> Note that if we ever allow drivers to hook in their own release callback, 
> then we certainly should switch to a mutex in the cdev struct, rather than 
> a global mutex. It obviously makes life more complicated for cdev, but much 
> easier for drivers.

I don't see it being easier for drivers, you should provide this kind of
infrastructure within your framework already.

Actually, we already do provide this kind of framework, what's wrong
with using "struct device" for this, like the rest of the kernel does?
That is the device you need to be doing the reference counting and
release code for, it is exactly what it is there for.

So why is V4L different than the rest of the kernel in that it wishes to
do things differently?

confused,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 19:53                   ` Greg KH
@ 2008-12-17 20:18                       ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 20:18 UTC (permalink / raw)
  To: Greg KH; +Cc: video4linux-list, linux-kernel

On Wednesday 17 December 2008 20:53:29 Greg KH wrote:
> On Wed, Dec 17, 2008 at 08:39:03PM +0100, Hans Verkuil wrote:
> > On Wednesday 17 December 2008 20:30:32 Hans Verkuil wrote:
> > > This solves this particular problem. But this will certainly break
> > > v4l as it is right now, since the spin_lock means that the kref's
> > > release cannot do any sleeps, which is possible in v4l. If we want to
> > > allow that in cdev, then the spinlock has to be replaced by a mutex.
> > > But I have the strong feeling that that's not going to happen :-)
> >
> > Note that if we ever allow drivers to hook in their own release
> > callback, then we certainly should switch to a mutex in the cdev
> > struct, rather than a global mutex. It obviously makes life more
> > complicated for cdev, but much easier for drivers.
>
> I don't see it being easier for drivers, you should provide this kind of
> infrastructure within your framework already.
>
> Actually, we already do provide this kind of framework, what's wrong
> with using "struct device" for this, like the rest of the kernel does?
> That is the device you need to be doing the reference counting and
> release code for, it is exactly what it is there for.
>
> So why is V4L different than the rest of the kernel in that it wishes to
> do things differently?

Because it has almost no proper framework to speak of and what little there 
is has been pretty much unchanged since the very beginning.

I'm trying to develop a decent framework that should help support upcoming 
devices and generally make life easier for v4l driver developers.

And I've no idea why we don't just use the device's release() callback for 
this. I'm going to implement this right now :-)

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
@ 2008-12-17 20:18                       ` Hans Verkuil
  0 siblings, 0 replies; 35+ messages in thread
From: Hans Verkuil @ 2008-12-17 20:18 UTC (permalink / raw)
  To: Greg KH; +Cc: video4linux-list, linux-kernel

On Wednesday 17 December 2008 20:53:29 Greg KH wrote:
> On Wed, Dec 17, 2008 at 08:39:03PM +0100, Hans Verkuil wrote:
> > On Wednesday 17 December 2008 20:30:32 Hans Verkuil wrote:
> > > This solves this particular problem. But this will certainly break
> > > v4l as it is right now, since the spin_lock means that the kref's
> > > release cannot do any sleeps, which is possible in v4l. If we want to
> > > allow that in cdev, then the spinlock has to be replaced by a mutex.
> > > But I have the strong feeling that that's not going to happen :-)
> >
> > Note that if we ever allow drivers to hook in their own release
> > callback, then we certainly should switch to a mutex in the cdev
> > struct, rather than a global mutex. It obviously makes life more
> > complicated for cdev, but much easier for drivers.
>
> I don't see it being easier for drivers, you should provide this kind of
> infrastructure within your framework already.
>
> Actually, we already do provide this kind of framework, what's wrong
> with using "struct device" for this, like the rest of the kernel does?
> That is the device you need to be doing the reference counting and
> release code for, it is exactly what it is there for.
>
> So why is V4L different than the rest of the kernel in that it wishes to
> do things differently?

Because it has almost no proper framework to speak of and what little there 
is has been pretty much unchanged since the very beginning.

I'm trying to develop a decent framework that should help support upcoming 
devices and generally make life easier for v4l driver developers.

And I've no idea why we don't just use the device's release() callback for 
this. I'm going to implement this right now :-)

Regards,

	Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG

--
video4linux-list mailing list
Unsubscribe mailto:video4linux-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/video4linux-list

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 20:18                       ` Hans Verkuil
  (?)
@ 2008-12-17 20:52                       ` Greg KH
  -1 siblings, 0 replies; 35+ messages in thread
From: Greg KH @ 2008-12-17 20:52 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: video4linux-list, linux-kernel

On Wed, Dec 17, 2008 at 09:18:29PM +0100, Hans Verkuil wrote:
> On Wednesday 17 December 2008 20:53:29 Greg KH wrote:
> > So why is V4L different than the rest of the kernel in that it wishes to
> > do things differently?
> 
> Because it has almost no proper framework to speak of and what little there 
> is has been pretty much unchanged since the very beginning.

Ick.

> I'm trying to develop a decent framework that should help support upcoming 
> devices and generally make life easier for v4l driver developers.
> 
> And I've no idea why we don't just use the device's release() callback for 
> this. I'm going to implement this right now :-)

Thanks, that makes oh so much more sense.  Let me know when you've
completed it so I can make your cdev change as well.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 18:08                     ` Al Viro
@ 2008-12-18  8:12                       ` Boaz Harrosh
  0 siblings, 0 replies; 35+ messages in thread
From: Boaz Harrosh @ 2008-12-18  8:12 UTC (permalink / raw)
  To: Al Viro
  Cc: Hans Verkuil, Greg KH, linux-kernel, v4l, Laurent Pinchart, Tejun Heo

Al Viro wrote:
> On Wed, Dec 17, 2008 at 06:33:38PM +0100, Hans Verkuil wrote:
>> On Wednesday 17 December 2008 17:09:27 Boaz Harrosh wrote:
>>> Hans Verkuil wrote:
>>>> On Wednesday 17 December 2008 15:52:38 Boaz Harrosh wrote:
>>>>> Hans Verkuil wrote:
>>>>>> On Wednesday 17 December 2008 00:30:39 Greg KH wrote:
>>>>>>> On Wed, Dec 17, 2008 at 12:23:41AM +0100, Hans Verkuil wrote:
>>>>>>>> On Tuesday 16 December 2008 22:21:50 Greg KH wrote:
>>>>>>>>> On Tue, Dec 16, 2008 at 10:00:51PM +0100, Hans Verkuil wrote:
>>>>>>>>>> On Tuesday 16 December 2008 21:22:48 Greg KH wrote:
>>>>>>>>>>> On Mon, Dec 08, 2008 at 09:56:26PM +0100, Hans Verkuil wrote:
>>>>>>>>>>>> Hi Greg,
> 
> [snip 400-odd lines, with ~30 lines of original contents]
> 
> Could the esteemed sir kindly consider trimming the fucking crap or taking
> that to alt.cascade?
> --

Sorry, I apologize for that. I was CCing someone new and felt I might need
to leave the context of the conversation. But you are right it was getting
out of hand.

Boaz

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG] cdev_put() race condition
  2008-12-17 17:33                     ` Hans Verkuil
  (?)
  (?)
@ 2008-12-18  8:25                     ` Boaz Harrosh
  -1 siblings, 0 replies; 35+ messages in thread
From: Boaz Harrosh @ 2008-12-18  8:25 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: Greg KH, linux-kernel, v4l, Laurent Pinchart, Tejun Heo

Hans Verkuil wrote:
>>>>> delayin
>>>>> delayout
>>>>> delayout
>> OK Now I do not understand something. How is the double release possible?
>> I guess it is something to do with the complicated chrdev_open() in its
>> inode->i_cdev == NULL case. But still isn't the kref inside kobject
>> suppose to protect me from exactly that?
> 
> Only if there is also proper locking to prevent a kref_get from being called 
> when the release of a kref_put is in progress. And that's missing in cdev.
> 
> Typical scenario: a USB device is disconnected, the driver sees that no 
> applications are using it, then it calls cdev_del. This calls cdev_put, the 
> cdev's refcount goes to 0 and it will call release. BUT, at this moment an 
> application can open the device *again* and chrdev_open() will reuse the 
> i_cdev pointer and call cdev_get(p) on it, and this increases the refcount 
> from 0 to 1 again. And later it calls cdev_put again and release is called 
> a second time.

OK I see. 

> 
>>>>> Note the duplicate 'delayin' messages. Also note that the cdev struct
>>>>> was allocated by sg.c, so the second cdev cleanup will likely poke
>>>>> into already freed memory.
>>>>>

>> I had a similar problem and solved it in a way that I think should be
>> safe. I'll try and run your tests to make sure. Here is a short
>> description of my solution:
>>
>> MyDevice {
>>    ...
>>    embed_cdev;
>>    embed_kref;
>>    ...
>> };
>>
>> OnHotPlug() {
>> 	...
>> 	kref_init(embed_kref); // kref==1
>> 	cdev_add(embed_cdev);  // cdev==1
>> 	get_(embed_cdev);      // cdev++==2
>> }
>>
>> OnFileOpen() { // kernel does // cdev++==n > 2
>> 	kref_get(embed_kref); // kref++==n > 1
>> }
>>
>> OnFileClose() {
>> 	kref_put(embed_kref, __release); // kref++==n >= 0
>> } // kernel does // cdev--==n >= 1
>>
>> OnHotRemove() {
>> 	cdev_del(embed_cdev);  // cdev--==1
>> 	kref_put(embed_kref, __release); // kref--==n >= 0
>> }
>>
>> __release() { // by definition kref==0
>> 	put_(embed_cdev);      // cdev--==0
>> }
>>
>> What do you think?
> 
> This won't help either. __release calls cdev_put, the cdev refcount goes to 
> 0, the cdev release is called, but at this time someone can open the device 
> again. Refcounting in the driver simply won't help since chrdev_open is 
> always called before the driver has a chance to check anything.
> 

No, But at this point cdev_del has already been called before the final put,
so if a chrdev_open is called while in cdev_release it will not find my
device anymore. I have separated the unmapping of the device from it's final put.

> In addition, I think it is nuts to introduce a kref that just shadows the 
> cdev's kref. We should be able to rely on cdev for this.
> 

I agree, that's why I like Tejun's patch because in theory we can get
rid of the shadow kref. (Without a layering violation)

> I expect that when you test your driver with this you should hit the same 
> race condition. Remember that you need to mknod the device. If you rely on 
> udev then this will never happen because udev has removed the device node 
> before cdev_del is called. (At least, I think that is always true. I'm no 
> expert on this.)
> 

OK I see, I do use udev in all my fedora images. I will try to disable it
and test. So far I was unable to reproduce the problem with my device.

>> BTW sg does not use kref and I suspect it might be racy by its own.
> 
> It might, but it has nothing to do with this bug.
> 
> Regards,
> 
> 	Hans
> 

Thanks
Boaz

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2008-12-18  8:26 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-12-08 20:56 [BUG] cdev_put() race condition Hans Verkuil
2008-12-08 20:56 ` Hans Verkuil
2008-12-16 10:06 ` Hans Verkuil
2008-12-16 10:06   ` Hans Verkuil
2008-12-16 20:22 ` Greg KH
2008-12-16 21:00   ` Hans Verkuil
2008-12-16 21:00     ` Hans Verkuil
2008-12-16 21:21     ` Greg KH
2008-12-16 23:23       ` Hans Verkuil
2008-12-16 23:23         ` Hans Verkuil
2008-12-16 23:30         ` Greg KH
2008-12-17 13:37           ` Hans Verkuil
2008-12-17 13:37             ` Hans Verkuil
2008-12-17 14:52             ` Boaz Harrosh
2008-12-17 15:07               ` Hans Verkuil
2008-12-17 15:07                 ` Hans Verkuil
2008-12-17 16:09                 ` Boaz Harrosh
2008-12-17 17:33                   ` Hans Verkuil
2008-12-17 17:33                     ` Hans Verkuil
2008-12-17 18:08                     ` Al Viro
2008-12-18  8:12                       ` Boaz Harrosh
2008-12-18  8:25                     ` Boaz Harrosh
2008-12-17 18:16             ` Greg KH
2008-12-17 19:27               ` Laurent Pinchart
2008-12-17 19:27                 ` Laurent Pinchart
2008-12-17 19:35                 ` Greg KH
2008-12-17 19:30               ` Hans Verkuil
2008-12-17 19:30                 ` Hans Verkuil
2008-12-17 19:38                 ` Greg KH
2008-12-17 19:39                 ` Hans Verkuil
2008-12-17 19:39                   ` Hans Verkuil
2008-12-17 19:53                   ` Greg KH
2008-12-17 20:18                     ` Hans Verkuil
2008-12-17 20:18                       ` Hans Verkuil
2008-12-17 20:52                       ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.