All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
@ 2013-08-02 12:27 Sylwester Nawrocki
  2013-08-02 13:00 ` Hans Verkuil
  0 siblings, 1 reply; 18+ messages in thread
From: Sylwester Nawrocki @ 2013-08-02 12:27 UTC (permalink / raw)
  To: linux-media; +Cc: hverkuil, laurent.pinchart, Sylwester Nawrocki, Kyungmin Park

As it currently stands this code doesn't protect against any races
between video device open() and its unregistration. Races could be
avoided by doing the video_is_registered() check protected by the
core mutex, while the video device unregistration is also done with
this mutex held.

The history of this code is that the second video_is_registered()
call has been added in commit ee6869afc922a9849979e49bb3bbcad7948
"V4L/DVB: v4l2: add core serialization lock" together with addition
of the core mutex support in fops:

        mutex_unlock(&videodev_lock);
-       if (vdev->fops->open)
-               ret = vdev->fops->open(filp);
+       if (vdev->fops->open) {
+               if (vdev->lock)
+                       mutex_lock(vdev->lock);
+               if (video_is_registered(vdev))
+                       ret = vdev->fops->open(filp);
+               else
+                       ret = -ENODEV;
+               if (vdev->lock)
+                       mutex_unlock(vdev->lock);
+       }

While commit cf5337358548b813479b58478539fc20ee86556c
"[media] v4l2-dev: remove V4L2_FL_LOCK_ALL_FOPS"
removed only code touching the mutex:

        mutex_unlock(&videodev_lock);
        if (vdev->fops->open) {
-               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags) &&
-                   mutex_lock_interruptible(vdev->lock)) {
-                       ret = -ERESTARTSYS;
-                       goto err;
-               }
                if (video_is_registered(vdev))
                        ret = vdev->fops->open(filp);
                else
                        ret = -ENODEV;
-               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags))
-                       mutex_unlock(vdev->lock);
        }

Remove the remaining video_is_registered() call as it doesn't provide
any real protection and just adds unnecessary overhead.

The drivers need to perform the unregistration check themselves inside
their file operation handlers, while holding respective mutex.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
---
 drivers/media/v4l2-core/v4l2-dev.c |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
index c8859d6..1743119 100644
--- a/drivers/media/v4l2-core/v4l2-dev.c
+++ b/drivers/media/v4l2-core/v4l2-dev.c
@@ -444,13 +444,9 @@ static int v4l2_open(struct inode *inode, struct file *filp)
 	/* and increase the device refcount */
 	video_get(vdev);
 	mutex_unlock(&videodev_lock);
-	if (vdev->fops->open) {
-		if (video_is_registered(vdev))
-			ret = vdev->fops->open(filp);
-		else
-			ret = -ENODEV;
-	}

+	if (vdev->fops->open)
+		ret = vdev->fops->open(filp);
 	if (vdev->debug)
 		printk(KERN_DEBUG "%s: open (%d)\n",
 			video_device_node_name(vdev), ret);
--
1.7.9.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-08-02 12:27 [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open() Sylwester Nawrocki
@ 2013-08-02 13:00 ` Hans Verkuil
  2013-08-07 16:49   ` Sylwester Nawrocki
  0 siblings, 1 reply; 18+ messages in thread
From: Hans Verkuil @ 2013-08-02 13:00 UTC (permalink / raw)
  To: Sylwester Nawrocki; +Cc: linux-media, laurent.pinchart, Kyungmin Park

Hi Sylwester,

The patch is good, but I have some issues with the commit message itself.

On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
> As it currently stands this code doesn't protect against any races
> between video device open() and its unregistration. Races could be
> avoided by doing the video_is_registered() check protected by the
> core mutex, while the video device unregistration is also done with
> this mutex held.

The video_unregister_device() is called completely asynchronously,
particularly in the case of usb drivers. So it was never the goal of
the video_is_registered() to be fool proof, since that isn't possible,
nor is that necessary.

The goal was that the v4l2 core would use it for the various file
operations and ioctls as a quick check whether the device was unregistered
and to return the correct error. This prevents drivers from having to do
the same thing.

> The history of this code is that the second video_is_registered()
> call has been added in commit ee6869afc922a9849979e49bb3bbcad7948
> "V4L/DVB: v4l2: add core serialization lock" together with addition
> of the core mutex support in fops:
> 
>         mutex_unlock(&videodev_lock);
> -       if (vdev->fops->open)
> -               ret = vdev->fops->open(filp);
> +       if (vdev->fops->open) {
> +               if (vdev->lock)
> +                       mutex_lock(vdev->lock);
> +               if (video_is_registered(vdev))
> +                       ret = vdev->fops->open(filp);
> +               else
> +                       ret = -ENODEV;
> +               if (vdev->lock)
> +                       mutex_unlock(vdev->lock);
> +       }

The history is slightly more complicated: this commit moved the video_is_registered
call from before the mutex_unlock(&videodev_lock); to just before the fops->open
call.

Commit ca9afe6f87b569cdf8e797395381f18ae23a2905 "v4l2-dev: fix race condition"
added the video_is_registered() call to where it was originally (inside the
videodev_lock critical section), but it didn't bother to remove the duplicate
second video_is_registered call.

So that's how v4l2_open ended up with two calls to video_is_registered.

> 
> While commit cf5337358548b813479b58478539fc20ee86556c
> "[media] v4l2-dev: remove V4L2_FL_LOCK_ALL_FOPS"
> removed only code touching the mutex:
> 
>         mutex_unlock(&videodev_lock);
>         if (vdev->fops->open) {
> -               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags) &&
> -                   mutex_lock_interruptible(vdev->lock)) {
> -                       ret = -ERESTARTSYS;
> -                       goto err;
> -               }
>                 if (video_is_registered(vdev))
>                         ret = vdev->fops->open(filp);
>                 else
>                         ret = -ENODEV;
> -               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags))
> -                       mutex_unlock(vdev->lock);
>         }
> 
> Remove the remaining video_is_registered() call as it doesn't provide
> any real protection and just adds unnecessary overhead.

True.

> The drivers need to perform the unregistration check themselves inside
> their file operation handlers, while holding respective mutex.

No, drivers do not need to do the unregistration check. Since unregistration
is asynchronous it can happen at any time, so there really is no point in
checking for it other than in the core. If the device is unregistered while
in the middle of a file operation, then that means that any USB activity will
return an error, and that any future file operations other than release() will
be met by an error as well from the v4l2 core.

> 
> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
> ---
>  drivers/media/v4l2-core/v4l2-dev.c |    8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
> index c8859d6..1743119 100644
> --- a/drivers/media/v4l2-core/v4l2-dev.c
> +++ b/drivers/media/v4l2-core/v4l2-dev.c
> @@ -444,13 +444,9 @@ static int v4l2_open(struct inode *inode, struct file *filp)
>  	/* and increase the device refcount */
>  	video_get(vdev);
>  	mutex_unlock(&videodev_lock);
> -	if (vdev->fops->open) {
> -		if (video_is_registered(vdev))
> -			ret = vdev->fops->open(filp);
> -		else
> -			ret = -ENODEV;
> -	}
> 
> +	if (vdev->fops->open)
> +		ret = vdev->fops->open(filp);
>  	if (vdev->debug)
>  		printk(KERN_DEBUG "%s: open (%d)\n",
>  			video_device_node_name(vdev), ret);
> --
> 1.7.9.5
> 

A long story, but the patch is good, although the commit message needs work :-)

Regards,

	Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-08-02 13:00 ` Hans Verkuil
@ 2013-08-07 16:49   ` Sylwester Nawrocki
  2013-08-07 17:49     ` Hans Verkuil
  0 siblings, 1 reply; 18+ messages in thread
From: Sylwester Nawrocki @ 2013-08-07 16:49 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: linux-media, laurent.pinchart, Kyungmin Park

Hi Hans,

On 08/02/2013 03:00 PM, Hans Verkuil wrote:
> Hi Sylwester,
> 
> The patch is good, but I have some issues with the commit message itself.

Thanks a lot for the detailed explanation, I just wrote this a bit 
longish changelog to possibly get some feedback and to better understand 
what is exactly going on. Currently the v4l2-core looks like a racing 
disaster to me.

> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
>> As it currently stands this code doesn't protect against any races
>> between video device open() and its unregistration. Races could be
>> avoided by doing the video_is_registered() check protected by the
>> core mutex, while the video device unregistration is also done with
>> this mutex held.
> 
> The video_unregister_device() is called completely asynchronously,
> particularly in the case of usb drivers. So it was never the goal of
> the video_is_registered() to be fool proof, since that isn't possible,
> nor is that necessary.
> 
> The goal was that the v4l2 core would use it for the various file
> operations and ioctls as a quick check whether the device was unregistered
> and to return the correct error. This prevents drivers from having to do
> the same thing.

OK, I think I just myself used this video_is_registered() flag for some
more stuff, by surrounding it with mutex_lock/mutex_unlock and putting
more stuff in between, like media_entity_cleanup(). And this probably led
me astray for a while, thinking that video_is_registered() was intended 
to be used synchronously.
For example see fimc_lite_subdev_unregistered() in drivers/media/platform/
exynos4-is/fimc-lite.c.

But as you said video_is_registered() is fully asynchronous. 

Actually I'm trying to fix a nasty race between deferred driver probing 
and video device open(). The problem is that video open() callback can 
be called after driver remove() callback was invoked.

This issue is actually not only related to deferred probing. It can be
also triggered by driver module removal or through driver's sysfs "unbind"
attribute.

Let's assume following scenario:

- a driver module is loaded
- driver probe is called, it registers video device,
- udev opens /dev/video
- after mutex_unlock(&videodev_lock); call in v4l2_open() in v4l2-core/
  v4l2-dev.c something fails in probe() and it unwinds, probe callback
  exits and the driver code code calls dev_set_drvdata(dev, NULL); as
  shown below.

static int really_probe(struct device *dev, struct device_driver *drv)
{
	...
	pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
		 drv->bus->name, __func__, drv->name, dev_name(dev));
	...
	if (dev->bus->probe) {
		ret = dev->bus->probe(dev);
		if (ret)
			goto probe_failed;
	} else if (drv->probe) {
		ret = drv->probe(dev);
		if (ret)
			goto probe_failed;
	}
	...
	pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
		 drv->bus->name, __func__, dev_name(dev), drv->name);
	goto done;

probe_failed:
	devres_release_all(dev);
	driver_sysfs_remove(dev);
	dev->driver = NULL;
	dev_set_drvdata(dev, NULL);

	...
	ret = 0;
done:
	...
	return ret;
}

Now we get to 

 	ret = vdev->fops->open(filp);

in v4l2_open(). This calls some driver's callback, e.g. something
like:

static int fimc_lite_open(struct file *file)
{
	struct fimc_lite *fimc = video_drvdata(file);
	struct media_entity *me = &fimc->ve.vdev.entity;
	int ret;

	mutex_lock(&fimc->lock);
	if (!video_is_registered(&fimc->ve.vdev)) {
		ret = -ENODEV;
		goto unlock;
	}

	...

	/* Mark video pipeline ending at this video node as in use. */
	if (ret == 0)
		me->use_count++;
	...
unlock:
	mutex_unlock(&fimc->lock);
	return ret;
}

Now what will video_drvdata(file); return ?

static inline void *video_drvdata(struct file *file)
{
	return video_get_drvdata(video_devdata(file));
}

static inline void *video_get_drvdata(struct video_device *vdev)
{
	return dev_get_drvdata(&vdev->dev);
}

Yes, so that will be just NULL o_O, due to the dev_set_drvdata(dev, NULL);
in really_probe(). drvdata is cleared similarly in __device_release_driver(),
right after calling driver's remove handler.

Another issue I have is that, e.g. driver/media/platform/exynos4-is/fimc-lite*
driver has empty video dev release() callback. It should be implemented
in the driver to kfree the whole driver's private data structure where
struct video_device is embedded in (struct fimc_lite). But that freeing 
really needs to be synchronized with driver's remove() call, since there 
is e.g. freed interrupt which accesses the driver's private data. I can't 
use kref from struct v4l2_device as that belongs to a different driver. 
A driver's custom reference counting comes to mind, where vdev->release() 
and drv->remove() would be decrementing the reference counter. But that
seems ugly as hell :/ And it predates devm_*.

This is all getting a bit depressing :/ Deferred probing and the 
asynchronous subdev handling just made those issues more visible, i.e.
not very good design of some parts of the v4l2-core.

>> The history of this code is that the second video_is_registered()
>> call has been added in commit ee6869afc922a9849979e49bb3bbcad7948
>> "V4L/DVB: v4l2: add core serialization lock" together with addition
>> of the core mutex support in fops:
>>
>>         mutex_unlock(&videodev_lock);
>> -       if (vdev->fops->open)
>> -               ret = vdev->fops->open(filp);
>> +       if (vdev->fops->open) {
>> +               if (vdev->lock)
>> +                       mutex_lock(vdev->lock);
>> +               if (video_is_registered(vdev))
>> +                       ret = vdev->fops->open(filp);
>> +               else
>> +                       ret = -ENODEV;
>> +               if (vdev->lock)
>> +                       mutex_unlock(vdev->lock);
>> +       }
> 
> The history is slightly more complicated: this commit moved the video_is_registered
> call from before the mutex_unlock(&videodev_lock); to just before the fops->open
> call.
> 
> Commit ca9afe6f87b569cdf8e797395381f18ae23a2905 "v4l2-dev: fix race condition"
> added the video_is_registered() call to where it was originally (inside the
> videodev_lock critical section), but it didn't bother to remove the duplicate
> second video_is_registered call.
> 
> So that's how v4l2_open ended up with two calls to video_is_registered.

Apologies for simplifying the history . I'll just drop it from the changelog, 
as it can be retrieved git. I'll try to put just concise explanation why this 
this video_is_registered() is not needed currently.

>> While commit cf5337358548b813479b58478539fc20ee86556c
>> "[media] v4l2-dev: remove V4L2_FL_LOCK_ALL_FOPS"
>> removed only code touching the mutex:
>>
>>         mutex_unlock(&videodev_lock);
>>         if (vdev->fops->open) {
>> -               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags) &&
>> -                   mutex_lock_interruptible(vdev->lock)) {
>> -                       ret = -ERESTARTSYS;
>> -                       goto err;
>> -               }
>>                 if (video_is_registered(vdev))
>>                         ret = vdev->fops->open(filp);
>>                 else
>>                         ret = -ENODEV;
>> -               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags))
>> -                       mutex_unlock(vdev->lock);
>>         }
>>
>> Remove the remaining video_is_registered() call as it doesn't provide
>> any real protection and just adds unnecessary overhead.
> 
> True.
> 
>> The drivers need to perform the unregistration check themselves inside
>> their file operation handlers, while holding respective mutex.
> 
> No, drivers do not need to do the unregistration check. Since unregistration
> is asynchronous it can happen at any time, so there really is no point in
> checking for it other than in the core. If the device is unregistered while
> in the middle of a file operation, then that means that any USB activity will
> return an error, and that any future file operations other than release() will
> be met by an error as well from the v4l2 core.

Yes, so video_is_registered() seems not very useful to use in drivers.
But as I've shown above it's not even used optimally in the v4l2-core.

>> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>> ---
>>  drivers/media/v4l2-core/v4l2-dev.c |    8 ++------
>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
>> index c8859d6..1743119 100644
>> --- a/drivers/media/v4l2-core/v4l2-dev.c
>> +++ b/drivers/media/v4l2-core/v4l2-dev.c
>> @@ -444,13 +444,9 @@ static int v4l2_open(struct inode *inode, struct file *filp)
>>  	/* and increase the device refcount */
>>  	video_get(vdev);
>>  	mutex_unlock(&videodev_lock);
>> -	if (vdev->fops->open) {
>> -		if (video_is_registered(vdev))
>> -			ret = vdev->fops->open(filp);
>> -		else
>> -			ret = -ENODEV;
>> -	}
>>
>> +	if (vdev->fops->open)
>> +		ret = vdev->fops->open(filp);
>>  	if (vdev->debug)
>>  		printk(KERN_DEBUG "%s: open (%d)\n",
>>  			video_device_node_name(vdev), ret);
>> --
>> 1.7.9.5
>>
> 
> A long story, but the patch is good, although the commit message needs work :-)
> 
> Regards,
> 
> 	Hans


Regards,
Sylwester

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-08-07 16:49   ` Sylwester Nawrocki
@ 2013-08-07 17:49     ` Hans Verkuil
  2013-08-07 22:16       ` Laurent Pinchart
                         ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Hans Verkuil @ 2013-08-07 17:49 UTC (permalink / raw)
  To: Sylwester Nawrocki; +Cc: linux-media, laurent.pinchart, Kyungmin Park

On 08/07/2013 06:49 PM, Sylwester Nawrocki wrote:
> Hi Hans,
> 
> On 08/02/2013 03:00 PM, Hans Verkuil wrote:
>> Hi Sylwester,
>>
>> The patch is good, but I have some issues with the commit message itself.
> 
> Thanks a lot for the detailed explanation, I just wrote this a bit 
> longish changelog to possibly get some feedback and to better understand 
> what is exactly going on. Currently the v4l2-core looks like a racing 
> disaster to me.
> 
>> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
>>> As it currently stands this code doesn't protect against any races
>>> between video device open() and its unregistration. Races could be
>>> avoided by doing the video_is_registered() check protected by the
>>> core mutex, while the video device unregistration is also done with
>>> this mutex held.
>>
>> The video_unregister_device() is called completely asynchronously,
>> particularly in the case of usb drivers. So it was never the goal of
>> the video_is_registered() to be fool proof, since that isn't possible,
>> nor is that necessary.
>>
>> The goal was that the v4l2 core would use it for the various file
>> operations and ioctls as a quick check whether the device was unregistered
>> and to return the correct error. This prevents drivers from having to do
>> the same thing.
> 
> OK, I think I just myself used this video_is_registered() flag for some
> more stuff, by surrounding it with mutex_lock/mutex_unlock and putting
> more stuff in between, like media_entity_cleanup().

You can't do that, because there are most likely still filehandles open
or even ioctls being executed. Cleanup happens in the release function(s)
when the kref goes to 0.

> And this probably led
> me astray for a while, thinking that video_is_registered() was intended 
> to be used synchronously.
> For example see fimc_lite_subdev_unregistered() in drivers/media/platform/
> exynos4-is/fimc-lite.c.
> 
> But as you said video_is_registered() is fully asynchronous. 
> 
> Actually I'm trying to fix a nasty race between deferred driver probing 
> and video device open(). The problem is that video open() callback can 
> be called after driver remove() callback was invoked.

How is that possible? The video_device_register must be the last thing in
the probe function. If it succeeds, then the probe must succeed as well.

Note that I now realize that this might fail in the case of multiple device
nodes being registered. We never had problems with that in the past, but in
theory you can the race condition you mention in that scenario. The correct
approach here would probably be to always return 0 in probe() if only some
of the video_device_register calls fail.

Anyway, assuming that only one device node is created, then I can't see how
you can get a race condition here. Any open() call will increase the module's
refcount, making it impossible to unload.

As far as I can tell, once you call rmmod it should no longer be possible to
open() an device node whose struct file_operations owner is that module (i.e.
the owner field of the file_operations struct points to that module). Looking
at the way fs/char_dev is implemented, that seems to be correctly handled by
the kernel core.

> 
> This issue is actually not only related to deferred probing. It can be
> also triggered by driver module removal or through driver's sysfs "unbind"
> attribute.
> 
> Let's assume following scenario:
> 
> - a driver module is loaded
> - driver probe is called, it registers video device,
> - udev opens /dev/video
> - after mutex_unlock(&videodev_lock); call in v4l2_open() in v4l2-core/
>   v4l2-dev.c something fails in probe()

And that shouldn't happen. That's the crucial bit: under which scenario does
this happen for you? If there is a control path where you do create a working
device node, but the probe fails, then that will indeed cause all sorts of
problems. But it shouldn't get in that situation (except I think in the case
of multiple device nodes, which is something I need to think about).

> and it unwinds, probe callback
>   exits and the driver code code calls dev_set_drvdata(dev, NULL); as
>   shown below.
> 
> static int really_probe(struct device *dev, struct device_driver *drv)
> {
> 	...
> 	pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
> 		 drv->bus->name, __func__, drv->name, dev_name(dev));
> 	...
> 	if (dev->bus->probe) {
> 		ret = dev->bus->probe(dev);
> 		if (ret)
> 			goto probe_failed;
> 	} else if (drv->probe) {
> 		ret = drv->probe(dev);
> 		if (ret)
> 			goto probe_failed;
> 	}
> 	...
> 	pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
> 		 drv->bus->name, __func__, dev_name(dev), drv->name);
> 	goto done;
> 
> probe_failed:
> 	devres_release_all(dev);
> 	driver_sysfs_remove(dev);
> 	dev->driver = NULL;
> 	dev_set_drvdata(dev, NULL);
> 
> 	...
> 	ret = 0;
> done:
> 	...
> 	return ret;
> }
> 
> Now we get to 
> 
>  	ret = vdev->fops->open(filp);
> 
> in v4l2_open(). This calls some driver's callback, e.g. something
> like:
> 
> static int fimc_lite_open(struct file *file)
> {
> 	struct fimc_lite *fimc = video_drvdata(file);
> 	struct media_entity *me = &fimc->ve.vdev.entity;
> 	int ret;
> 
> 	mutex_lock(&fimc->lock);
> 	if (!video_is_registered(&fimc->ve.vdev)) {
> 		ret = -ENODEV;
> 		goto unlock;
> 	}
> 
> 	...
> 
> 	/* Mark video pipeline ending at this video node as in use. */
> 	if (ret == 0)
> 		me->use_count++;
> 	...
> unlock:
> 	mutex_unlock(&fimc->lock);
> 	return ret;
> }
> 
> Now what will video_drvdata(file); return ?
> 
> static inline void *video_drvdata(struct file *file)
> {
> 	return video_get_drvdata(video_devdata(file));
> }
> 
> static inline void *video_get_drvdata(struct video_device *vdev)
> {
> 	return dev_get_drvdata(&vdev->dev);
> }
> 
> Yes, so that will be just NULL o_O, due to the dev_set_drvdata(dev, NULL);
> in really_probe(). drvdata is cleared similarly in __device_release_driver(),
> right after calling driver's remove handler.
> 
> Another issue I have is that, e.g. driver/media/platform/exynos4-is/fimc-lite*
> driver has empty video dev release() callback. It should be implemented
> in the driver to kfree the whole driver's private data structure where
> struct video_device is embedded in (struct fimc_lite). But that freeing 
> really needs to be synchronized with driver's remove() call, since there 
> is e.g. freed interrupt which accesses the driver's private data. I can't 
> use kref from struct v4l2_device as that belongs to a different driver. 
> A driver's custom reference counting comes to mind, where vdev->release() 
> and drv->remove() would be decrementing the reference counter. But that
> seems ugly as hell :/ And it predates devm_*.
> 
> This is all getting a bit depressing :/ Deferred probing and the 
> asynchronous subdev handling just made those issues more visible, i.e.
> not very good design of some parts of the v4l2-core.

It's just not clear to me how exactly things go wrong for you. Ping me on irc
tomorrow and we can discuss it further. I have reworked refcounting in the
past (at the time it was *really* bad), so perhaps we need to rework it again,
particularly with video nodes associated with subdevices in the mix, something
that didn't exist at the time.

> 
>>> The history of this code is that the second video_is_registered()
>>> call has been added in commit ee6869afc922a9849979e49bb3bbcad7948
>>> "V4L/DVB: v4l2: add core serialization lock" together with addition
>>> of the core mutex support in fops:
>>>
>>>         mutex_unlock(&videodev_lock);
>>> -       if (vdev->fops->open)
>>> -               ret = vdev->fops->open(filp);
>>> +       if (vdev->fops->open) {
>>> +               if (vdev->lock)
>>> +                       mutex_lock(vdev->lock);
>>> +               if (video_is_registered(vdev))
>>> +                       ret = vdev->fops->open(filp);
>>> +               else
>>> +                       ret = -ENODEV;
>>> +               if (vdev->lock)
>>> +                       mutex_unlock(vdev->lock);
>>> +       }
>>
>> The history is slightly more complicated: this commit moved the video_is_registered
>> call from before the mutex_unlock(&videodev_lock); to just before the fops->open
>> call.
>>
>> Commit ca9afe6f87b569cdf8e797395381f18ae23a2905 "v4l2-dev: fix race condition"
>> added the video_is_registered() call to where it was originally (inside the
>> videodev_lock critical section), but it didn't bother to remove the duplicate
>> second video_is_registered call.
>>
>> So that's how v4l2_open ended up with two calls to video_is_registered.
> 
> Apologies for simplifying the history . I'll just drop it from the changelog, 
> as it can be retrieved git. I'll try to put just concise explanation why this 
> this video_is_registered() is not needed currently.
> 
>>> While commit cf5337358548b813479b58478539fc20ee86556c
>>> "[media] v4l2-dev: remove V4L2_FL_LOCK_ALL_FOPS"
>>> removed only code touching the mutex:
>>>
>>>         mutex_unlock(&videodev_lock);
>>>         if (vdev->fops->open) {
>>> -               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags) &&
>>> -                   mutex_lock_interruptible(vdev->lock)) {
>>> -                       ret = -ERESTARTSYS;
>>> -                       goto err;
>>> -               }
>>>                 if (video_is_registered(vdev))
>>>                         ret = vdev->fops->open(filp);
>>>                 else
>>>                         ret = -ENODEV;
>>> -               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags))
>>> -                       mutex_unlock(vdev->lock);
>>>         }
>>>
>>> Remove the remaining video_is_registered() call as it doesn't provide
>>> any real protection and just adds unnecessary overhead.
>>
>> True.
>>
>>> The drivers need to perform the unregistration check themselves inside
>>> their file operation handlers, while holding respective mutex.
>>
>> No, drivers do not need to do the unregistration check. Since unregistration
>> is asynchronous it can happen at any time, so there really is no point in
>> checking for it other than in the core. If the device is unregistered while
>> in the middle of a file operation, then that means that any USB activity will
>> return an error, and that any future file operations other than release() will
>> be met by an error as well from the v4l2 core.
> 
> Yes, so video_is_registered() seems not very useful to use in drivers.

That's true. The main use case for it is in the v4l2 core to stop ioctls and fops
from going through to the driver and return an error instead. So drivers don't
need to do that themselves.

> But as I've shown above it's not even used optimally in the v4l2-core.

There really isn't anything else you can do with it in my view.

Regards,

	Hans

> 
>>> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
>>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>> ---
>>>  drivers/media/v4l2-core/v4l2-dev.c |    8 ++------
>>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
>>> index c8859d6..1743119 100644
>>> --- a/drivers/media/v4l2-core/v4l2-dev.c
>>> +++ b/drivers/media/v4l2-core/v4l2-dev.c
>>> @@ -444,13 +444,9 @@ static int v4l2_open(struct inode *inode, struct file *filp)
>>>  	/* and increase the device refcount */
>>>  	video_get(vdev);
>>>  	mutex_unlock(&videodev_lock);
>>> -	if (vdev->fops->open) {
>>> -		if (video_is_registered(vdev))
>>> -			ret = vdev->fops->open(filp);
>>> -		else
>>> -			ret = -ENODEV;
>>> -	}
>>>
>>> +	if (vdev->fops->open)
>>> +		ret = vdev->fops->open(filp);
>>>  	if (vdev->debug)
>>>  		printk(KERN_DEBUG "%s: open (%d)\n",
>>>  			video_device_node_name(vdev), ret);
>>> --
>>> 1.7.9.5
>>>
>>
>> A long story, but the patch is good, although the commit message needs work :-)
>>
>> Regards,
>>
>> 	Hans
> 
> 
> Regards,
> Sylwester
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-08-07 17:49     ` Hans Verkuil
@ 2013-08-07 22:16       ` Laurent Pinchart
  2013-08-08 12:36       ` Andrzej Hajda
  2013-09-05 22:33       ` Sylwester Nawrocki
  2 siblings, 0 replies; 18+ messages in thread
From: Laurent Pinchart @ 2013-08-07 22:16 UTC (permalink / raw)
  To: Hans Verkuil; +Cc: Sylwester Nawrocki, linux-media, Kyungmin Park

Hi Hans and Sylwester,

On Wednesday 07 August 2013 19:49:53 Hans Verkuil wrote:
> On 08/07/2013 06:49 PM, Sylwester Nawrocki wrote:
> > On 08/02/2013 03:00 PM, Hans Verkuil wrote:
> >> Hi Sylwester,
> >> 
> >> The patch is good, but I have some issues with the commit message itself.
> > 
> > Thanks a lot for the detailed explanation, I just wrote this a bit
> > longish changelog to possibly get some feedback and to better understand
> > what is exactly going on. Currently the v4l2-core looks like a racing
> > disaster to me.
> > 
> >> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
> >>> As it currently stands this code doesn't protect against any races
> >>> between video device open() and its unregistration. Races could be
> >>> avoided by doing the video_is_registered() check protected by the
> >>> core mutex, while the video device unregistration is also done with
> >>> this mutex held.
> >> 
> >> The video_unregister_device() is called completely asynchronously,
> >> particularly in the case of usb drivers. So it was never the goal of
> >> the video_is_registered() to be fool proof, since that isn't possible,
> >> nor is that necessary.
> >> 
> >> The goal was that the v4l2 core would use it for the various file
> >> operations and ioctls as a quick check whether the device was
> >> unregistered and to return the correct error. This prevents drivers from
> >> having to do the same thing.
> > 
> > OK, I think I just myself used this video_is_registered() flag for some
> > more stuff, by surrounding it with mutex_lock/mutex_unlock and putting
> > more stuff in between, like media_entity_cleanup().
> 
> You can't do that, because there are most likely still filehandles open
> or even ioctls being executed. Cleanup happens in the release function(s)
> when the kref goes to 0.
> 
> > And this probably led me astray for a while, thinking that
> > video_is_registered() was intended to be used synchronously.
> > For example see fimc_lite_subdev_unregistered() in drivers/media/platform/
> > exynos4-is/fimc-lite.c.
> > 
> > But as you said video_is_registered() is fully asynchronous.
> > 
> > Actually I'm trying to fix a nasty race between deferred driver probing
> > and video device open(). The problem is that video open() callback can
> > be called after driver remove() callback was invoked.
> 
> How is that possible? The video_device_register must be the last thing in
> the probe function. If it succeeds, then the probe must succeed as well.
> 
> Note that I now realize that this might fail in the case of multiple device
> nodes being registered. We never had problems with that in the past, but in
> theory you can the race condition you mention in that scenario. The correct
> approach here would probably be to always return 0 in probe() if only some
> of the video_device_register calls fail.

Hmmmm... Returning success in probe when probing partly fails doesn't sound 
very good to me.

Once you call video_device_register() you should be prepared to handle 
userspace calls. The device node can be opened, in which case the module 
refcount will be incremented, but probe() can still fail. However, the video 
device becomes refcounted as soon as it's registered, so drivers should only 
release resources in the release callback. This would unfortunately mean that 
the devm_* helpers can't be used.

I would be surprised if this problem was specific to V4L2. It might be 
something we should try to solve with the help of the device core.

> Anyway, assuming that only one device node is created, then I can't see how
> you can get a race condition here. Any open() call will increase the
> module's refcount, making it impossible to unload.
> 
> As far as I can tell, once you call rmmod it should no longer be possible to
> open() an device node whose struct file_operations owner is that module
> (i.e. the owner field of the file_operations struct points to that module).
> Looking at the way fs/char_dev is implemented, that seems to be correctly
> handled by the kernel core.
> 
> > This issue is actually not only related to deferred probing. It can be
> > also triggered by driver module removal or through driver's sysfs "unbind"
> > attribute.
> > 
> > Let's assume following scenario:
> > 
> > - a driver module is loaded
> > - driver probe is called, it registers video device,
> > - udev opens /dev/video
> > - after mutex_unlock(&videodev_lock); call in v4l2_open() in v4l2-core/
> >   v4l2-dev.c something fails in probe()
> 
> And that shouldn't happen. That's the crucial bit: under which scenario does
> this happen for you? If there is a control path where you do create a
> working device node, but the probe fails, then that will indeed cause all
> sorts of problems. But it shouldn't get in that situation (except I think
> in the case of multiple device nodes, which is something I need to think
> about).
>
> >   and it unwinds, probe callback exits and the driver code code calls
> >   dev_set_drvdata(dev, NULL); as shown below.
> > 
> > static int really_probe(struct device *dev, struct device_driver *drv)
> > {
> > 	...
> > 	pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
> > 		 drv->bus->name, __func__, drv->name, dev_name(dev));
> > 	...
> > 	if (dev->bus->probe) {
> > 		ret = dev->bus->probe(dev);
> > 		if (ret)
> > 			goto probe_failed;
> > 	
> > 	} else if (drv->probe) {
> > 		ret = drv->probe(dev);
> > 		if (ret)
> > 			goto probe_failed;
> > 	}
> > 	...
> > 	pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
> > 		 drv->bus->name, __func__, dev_name(dev), drv->name);
> > 	
> > 	goto done;
> > 
> > probe_failed:
> > 	devres_release_all(dev);
> > 	driver_sysfs_remove(dev);
> > 	dev->driver = NULL;
> > 	dev_set_drvdata(dev, NULL);
> > 	...
> > 	ret = 0;
> > 
> > done:
> > 	...
> > 	return ret;
> > }
> > 
> > Now we get to
> > 
> >  	ret = vdev->fops->open(filp);
> > 
> > in v4l2_open(). This calls some driver's callback, e.g. something
> > like:
> > 
> > static int fimc_lite_open(struct file *file)
> > {
> > 	struct fimc_lite *fimc = video_drvdata(file);
> > 	struct media_entity *me = &fimc->ve.vdev.entity;
> > 	int ret;
> > 	
> > 	mutex_lock(&fimc->lock);
> > 	if (!video_is_registered(&fimc->ve.vdev)) {
> > 		ret = -ENODEV;
> > 		goto unlock;
> > 	}
> > 	
> > 	...
> > 	
> > 	/* Mark video pipeline ending at this video node as in use. */
> > 	if (ret == 0)
> > 		me->use_count++;
> > 	
> > 	...
> > 
> > unlock:
> > 	mutex_unlock(&fimc->lock);
> > 	return ret;
> > }
> > 
> > Now what will video_drvdata(file); return ?
> > 
> > static inline void *video_drvdata(struct file *file)
> > {
> > 	return video_get_drvdata(video_devdata(file));
> > }
> > 
> > static inline void *video_get_drvdata(struct video_device *vdev)
> > {
> > 	return dev_get_drvdata(&vdev->dev);
> > }
> > 
> > Yes, so that will be just NULL o_O, due to the dev_set_drvdata(dev, NULL);
> > in really_probe(). drvdata is cleared similarly in
> > __device_release_driver(), right after calling driver's remove handler.
> > 
> > Another issue I have is that, e.g.
> > driver/media/platform/exynos4-is/fimc-lite* driver has empty video dev
> > release() callback. It should be implemented in the driver to kfree the
> > whole driver's private data structure where struct video_device is
> > embedded in (struct fimc_lite). But that freeing really needs to be
> > synchronized with driver's remove() call, since there is e.g. freed
> > interrupt which accesses the driver's private data. I can't use kref from
> > struct v4l2_device as that belongs to a different driver. A driver's
> > custom reference counting comes to mind, where vdev->release() and
> > drv->remove() would be decrementing the reference counter. But that seems
> > ugly as hell :/ And it predates devm_*.
> > 
> > This is all getting a bit depressing :/ Deferred probing and the
> > asynchronous subdev handling just made those issues more visible, i.e.
> > not very good design of some parts of the v4l2-core.
> 
> It's just not clear to me how exactly things go wrong for you. Ping me on
> irc tomorrow and we can discuss it further. I have reworked refcounting in
> the past (at the time it was *really* bad), so perhaps we need to rework it
> again, particularly with video nodes associated with subdevices in the mix,
> something that didn't exist at the time.
> 
> >>> The history of this code is that the second video_is_registered()
> >>> call has been added in commit ee6869afc922a9849979e49bb3bbcad7948
> >>> "V4L/DVB: v4l2: add core serialization lock" together with addition
> >>> 
> >>> of the core mutex support in fops:
> >>>         mutex_unlock(&videodev_lock);
> >>> 
> >>> -       if (vdev->fops->open)
> >>> -               ret = vdev->fops->open(filp);
> >>> +       if (vdev->fops->open) {
> >>> +               if (vdev->lock)
> >>> +                       mutex_lock(vdev->lock);
> >>> +               if (video_is_registered(vdev))
> >>> +                       ret = vdev->fops->open(filp);
> >>> +               else
> >>> +                       ret = -ENODEV;
> >>> +               if (vdev->lock)
> >>> +                       mutex_unlock(vdev->lock);
> >>> +       }
> >> 
> >> The history is slightly more complicated: this commit moved the
> >> video_is_registered call from before the mutex_unlock(&videodev_lock);
> >> to just before the fops->open call.
> >> 
> >> Commit ca9afe6f87b569cdf8e797395381f18ae23a2905 "v4l2-dev: fix race
> >> condition" added the video_is_registered() call to where it was
> >> originally (inside the videodev_lock critical section), but it didn't
> >> bother to remove the duplicate second video_is_registered call.
> >> 
> >> So that's how v4l2_open ended up with two calls to video_is_registered.
> > 
> > Apologies for simplifying the history . I'll just drop it from the
> > changelog, as it can be retrieved git. I'll try to put just concise
> > explanation why this this video_is_registered() is not needed currently.
> > 
> >>> While commit cf5337358548b813479b58478539fc20ee86556c
> >>> "[media] v4l2-dev: remove V4L2_FL_LOCK_ALL_FOPS"
> >>> 
> >>> removed only code touching the mutex:
> >>>         mutex_unlock(&videodev_lock);
> >>>         if (vdev->fops->open) {
> >>> 
> >>> -               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags) &&
> >>> -                   mutex_lock_interruptible(vdev->lock)) {
> >>> -                       ret = -ERESTARTSYS;
> >>> -                       goto err;
> >>> -               }
> >>> 
> >>>                 if (video_is_registered(vdev))
> >>>                 
> >>>                         ret = vdev->fops->open(filp);
> >>>                 
> >>>                 else
> >>>                 
> >>>                         ret = -ENODEV;
> >>> 
> >>> -               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags))
> >>> -                       mutex_unlock(vdev->lock);
> >>> 
> >>>         }
> >>> 
> >>> Remove the remaining video_is_registered() call as it doesn't provide
> >>> any real protection and just adds unnecessary overhead.
> >> 
> >> True.
> >> 
> >>> The drivers need to perform the unregistration check themselves inside
> >>> their file operation handlers, while holding respective mutex.
> >> 
> >> No, drivers do not need to do the unregistration check. Since
> >> unregistration is asynchronous it can happen at any time, so there
> >> really is no point in checking for it other than in the core. If the
> >> device is unregistered while in the middle of a file operation, then
> >> that means that any USB activity will return an error, and that any
> >> future file operations other than release() will be met by an error as
> >> well from the v4l2 core.
> > 
> > Yes, so video_is_registered() seems not very useful to use in drivers.
> 
> That's true. The main use case for it is in the v4l2 core to stop ioctls and
> fops from going through to the driver and return an error instead. So
> drivers don't need to do that themselves.

I'd like to add that the video_is_registered() call in v4l2_open() really 
protects from a race condition with video_unregister_device() and makes sure 
that either v4l2_open() gets a reference to the device before it gets 
unregistered in video_unregister_device(), or v4l2_open() fails the registered 
check and returns an error.

> > But as I've shown above it's not even used optimally in the v4l2-core.
> 
> There really isn't anything else you can do with it in my view.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-08-07 17:49     ` Hans Verkuil
  2013-08-07 22:16       ` Laurent Pinchart
@ 2013-08-08 12:36       ` Andrzej Hajda
  2013-09-05 22:33       ` Sylwester Nawrocki
  2 siblings, 0 replies; 18+ messages in thread
From: Andrzej Hajda @ 2013-08-08 12:36 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Sylwester Nawrocki, linux-media, laurent.pinchart, Kyungmin Park

Hi,

As a person who debugged the problem I would like add my comments.

On 08/07/2013 07:49 PM, Hans Verkuil wrote:
> On 08/07/2013 06:49 PM, Sylwester Nawrocki wrote:
>> Hi Hans,
>>
>> On 08/02/2013 03:00 PM, Hans Verkuil wrote:
>>> Hi Sylwester,
>>>
>>> The patch is good, but I have some issues with the commit message itself.
>>
>> Thanks a lot for the detailed explanation, I just wrote this a bit 
>> longish changelog to possibly get some feedback and to better understand 
>> what is exactly going on. Currently the v4l2-core looks like a racing 
>> disaster to me.
>>
>>> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
>>>> As it currently stands this code doesn't protect against any races
>>>> between video device open() and its unregistration. Races could be
>>>> avoided by doing the video_is_registered() check protected by the
>>>> core mutex, while the video device unregistration is also done with
>>>> this mutex held.
>>>
>>> The video_unregister_device() is called completely asynchronously,
>>> particularly in the case of usb drivers. So it was never the goal of
>>> the video_is_registered() to be fool proof, since that isn't possible,
>>> nor is that necessary.
>>>
>>> The goal was that the v4l2 core would use it for the various file
>>> operations and ioctls as a quick check whether the device was unregistered
>>> and to return the correct error. This prevents drivers from having to do
>>> the same thing.
>>
>> OK, I think I just myself used this video_is_registered() flag for some
>> more stuff, by surrounding it with mutex_lock/mutex_unlock and putting
>> more stuff in between, like media_entity_cleanup().
> 
> You can't do that, because there are most likely still filehandles open
> or even ioctls being executed. Cleanup happens in the release function(s)
> when the kref goes to 0.
> 
>> And this probably led
>> me astray for a while, thinking that video_is_registered() was intended 
>> to be used synchronously.
>> For example see fimc_lite_subdev_unregistered() in drivers/media/platform/
>> exynos4-is/fimc-lite.c.
>>
>> But as you said video_is_registered() is fully asynchronous. 
>>
>> Actually I'm trying to fix a nasty race between deferred driver probing 
>> and video device open(). The problem is that video open() callback can 
>> be called after driver remove() callback was invoked.
> 
> How is that possible? The video_device_register must be the last thing in
> the probe function. If it succeeds, then the probe must succeed as well.
> 
> Note that I now realize that this might fail in the case of multiple device
> nodes being registered. We never had problems with that in the past, but in
> theory you can the race condition you mention in that scenario. The correct
> approach here would probably be to always return 0 in probe() if only some
> of the video_device_register calls fail.
> 
> Anyway, assuming that only one device node is created, then I can't see how
> you can get a race condition here. Any open() call will increase the module's
> refcount, making it impossible to unload.

The problem is that increasing refcount does not prevent the driver from
unbinding via sysfs unbind attribute. Drivers/core are not handling
such situation correctly. For example during v4l2 open callback
driver_data is not protected by any lock so it can disappear in the most
inconvenient time and cause system crash/hang (proven by real tests).

The patch introducing unbind attribute suggests that it could be used in
production systems ( https://lkml.org/lkml/2005/6/24/27 ) so I suppose
the situation should be better handled.

device_lock is used by unbind, so maybe it could be used also by callbacks?


Regards
Andrzej


> 
> As far as I can tell, once you call rmmod it should no longer be possible to
> open() an device node whose struct file_operations owner is that module (i.e.
> the owner field of the file_operations struct points to that module). Looking
> at the way fs/char_dev is implemented, that seems to be correctly handled by
> the kernel core.
> 
>>
>> This issue is actually not only related to deferred probing. It can be
>> also triggered by driver module removal or through driver's sysfs "unbind"
>> attribute.
>>
>> Let's assume following scenario:
>>
>> - a driver module is loaded
>> - driver probe is called, it registers video device,
>> - udev opens /dev/video
>> - after mutex_unlock(&videodev_lock); call in v4l2_open() in v4l2-core/
>>   v4l2-dev.c something fails in probe()
> 
> And that shouldn't happen. That's the crucial bit: under which scenario does
> this happen for you? If there is a control path where you do create a working
> device node, but the probe fails, then that will indeed cause all sorts of
> problems. But it shouldn't get in that situation (except I think in the case
> of multiple device nodes, which is something I need to think about).
> 
>> and it unwinds, probe callback
>>   exits and the driver code code calls dev_set_drvdata(dev, NULL); as
>>   shown below.
>>
>> static int really_probe(struct device *dev, struct device_driver *drv)
>> {
>> 	...
>> 	pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
>> 		 drv->bus->name, __func__, drv->name, dev_name(dev));
>> 	...
>> 	if (dev->bus->probe) {
>> 		ret = dev->bus->probe(dev);
>> 		if (ret)
>> 			goto probe_failed;
>> 	} else if (drv->probe) {
>> 		ret = drv->probe(dev);
>> 		if (ret)
>> 			goto probe_failed;
>> 	}
>> 	...
>> 	pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
>> 		 drv->bus->name, __func__, dev_name(dev), drv->name);
>> 	goto done;
>>
>> probe_failed:
>> 	devres_release_all(dev);
>> 	driver_sysfs_remove(dev);
>> 	dev->driver = NULL;
>> 	dev_set_drvdata(dev, NULL);
>>
>> 	...
>> 	ret = 0;
>> done:
>> 	...
>> 	return ret;
>> }
>>
>> Now we get to 
>>
>>  	ret = vdev->fops->open(filp);
>>
>> in v4l2_open(). This calls some driver's callback, e.g. something
>> like:
>>
>> static int fimc_lite_open(struct file *file)
>> {
>> 	struct fimc_lite *fimc = video_drvdata(file);
>> 	struct media_entity *me = &fimc->ve.vdev.entity;
>> 	int ret;
>>
>> 	mutex_lock(&fimc->lock);
>> 	if (!video_is_registered(&fimc->ve.vdev)) {
>> 		ret = -ENODEV;
>> 		goto unlock;
>> 	}
>>
>> 	...
>>
>> 	/* Mark video pipeline ending at this video node as in use. */
>> 	if (ret == 0)
>> 		me->use_count++;
>> 	...
>> unlock:
>> 	mutex_unlock(&fimc->lock);
>> 	return ret;
>> }
>>
>> Now what will video_drvdata(file); return ?
>>
>> static inline void *video_drvdata(struct file *file)
>> {
>> 	return video_get_drvdata(video_devdata(file));
>> }
>>
>> static inline void *video_get_drvdata(struct video_device *vdev)
>> {
>> 	return dev_get_drvdata(&vdev->dev);
>> }
>>
>> Yes, so that will be just NULL o_O, due to the dev_set_drvdata(dev, NULL);
>> in really_probe(). drvdata is cleared similarly in __device_release_driver(),
>> right after calling driver's remove handler.
>>
>> Another issue I have is that, e.g. driver/media/platform/exynos4-is/fimc-lite*
>> driver has empty video dev release() callback. It should be implemented
>> in the driver to kfree the whole driver's private data structure where
>> struct video_device is embedded in (struct fimc_lite). But that freeing 
>> really needs to be synchronized with driver's remove() call, since there 
>> is e.g. freed interrupt which accesses the driver's private data. I can't 
>> use kref from struct v4l2_device as that belongs to a different driver. 
>> A driver's custom reference counting comes to mind, where vdev->release() 
>> and drv->remove() would be decrementing the reference counter. But that
>> seems ugly as hell :/ And it predates devm_*.
>>
>> This is all getting a bit depressing :/ Deferred probing and the 
>> asynchronous subdev handling just made those issues more visible, i.e.
>> not very good design of some parts of the v4l2-core.
> 
> It's just not clear to me how exactly things go wrong for you. Ping me on irc
> tomorrow and we can discuss it further. I have reworked refcounting in the
> past (at the time it was *really* bad), so perhaps we need to rework it again,
> particularly with video nodes associated with subdevices in the mix, something
> that didn't exist at the time.
> 
>>
>>>> The history of this code is that the second video_is_registered()
>>>> call has been added in commit ee6869afc922a9849979e49bb3bbcad7948
>>>> "V4L/DVB: v4l2: add core serialization lock" together with addition
>>>> of the core mutex support in fops:
>>>>
>>>>         mutex_unlock(&videodev_lock);
>>>> -       if (vdev->fops->open)
>>>> -               ret = vdev->fops->open(filp);
>>>> +       if (vdev->fops->open) {
>>>> +               if (vdev->lock)
>>>> +                       mutex_lock(vdev->lock);
>>>> +               if (video_is_registered(vdev))
>>>> +                       ret = vdev->fops->open(filp);
>>>> +               else
>>>> +                       ret = -ENODEV;
>>>> +               if (vdev->lock)
>>>> +                       mutex_unlock(vdev->lock);
>>>> +       }
>>>
>>> The history is slightly more complicated: this commit moved the video_is_registered
>>> call from before the mutex_unlock(&videodev_lock); to just before the fops->open
>>> call.
>>>
>>> Commit ca9afe6f87b569cdf8e797395381f18ae23a2905 "v4l2-dev: fix race condition"
>>> added the video_is_registered() call to where it was originally (inside the
>>> videodev_lock critical section), but it didn't bother to remove the duplicate
>>> second video_is_registered call.
>>>
>>> So that's how v4l2_open ended up with two calls to video_is_registered.
>>
>> Apologies for simplifying the history . I'll just drop it from the changelog, 
>> as it can be retrieved git. I'll try to put just concise explanation why this 
>> this video_is_registered() is not needed currently.
>>
>>>> While commit cf5337358548b813479b58478539fc20ee86556c
>>>> "[media] v4l2-dev: remove V4L2_FL_LOCK_ALL_FOPS"
>>>> removed only code touching the mutex:
>>>>
>>>>         mutex_unlock(&videodev_lock);
>>>>         if (vdev->fops->open) {
>>>> -               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags) &&
>>>> -                   mutex_lock_interruptible(vdev->lock)) {
>>>> -                       ret = -ERESTARTSYS;
>>>> -                       goto err;
>>>> -               }
>>>>                 if (video_is_registered(vdev))
>>>>                         ret = vdev->fops->open(filp);
>>>>                 else
>>>>                         ret = -ENODEV;
>>>> -               if (test_bit(V4L2_FL_LOCK_ALL_FOPS, &vdev->flags))
>>>> -                       mutex_unlock(vdev->lock);
>>>>         }
>>>>
>>>> Remove the remaining video_is_registered() call as it doesn't provide
>>>> any real protection and just adds unnecessary overhead.
>>>
>>> True.
>>>
>>>> The drivers need to perform the unregistration check themselves inside
>>>> their file operation handlers, while holding respective mutex.
>>>
>>> No, drivers do not need to do the unregistration check. Since unregistration
>>> is asynchronous it can happen at any time, so there really is no point in
>>> checking for it other than in the core. If the device is unregistered while
>>> in the middle of a file operation, then that means that any USB activity will
>>> return an error, and that any future file operations other than release() will
>>> be met by an error as well from the v4l2 core.
>>
>> Yes, so video_is_registered() seems not very useful to use in drivers.
> 
> That's true. The main use case for it is in the v4l2 core to stop ioctls and fops
> from going through to the driver and return an error instead. So drivers don't
> need to do that themselves.
> 
>> But as I've shown above it's not even used optimally in the v4l2-core.
> 
> There really isn't anything else you can do with it in my view.
> 
> Regards,
> 
> 	Hans
> 
>>
>>>> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
>>>> Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
>>>> ---
>>>>  drivers/media/v4l2-core/v4l2-dev.c |    8 ++------
>>>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
>>>> index c8859d6..1743119 100644
>>>> --- a/drivers/media/v4l2-core/v4l2-dev.c
>>>> +++ b/drivers/media/v4l2-core/v4l2-dev.c
>>>> @@ -444,13 +444,9 @@ static int v4l2_open(struct inode *inode, struct file *filp)
>>>>  	/* and increase the device refcount */
>>>>  	video_get(vdev);
>>>>  	mutex_unlock(&videodev_lock);
>>>> -	if (vdev->fops->open) {
>>>> -		if (video_is_registered(vdev))
>>>> -			ret = vdev->fops->open(filp);
>>>> -		else
>>>> -			ret = -ENODEV;
>>>> -	}
>>>>
>>>> +	if (vdev->fops->open)
>>>> +		ret = vdev->fops->open(filp);
>>>>  	if (vdev->debug)
>>>>  		printk(KERN_DEBUG "%s: open (%d)\n",
>>>>  			video_device_node_name(vdev), ret);
>>>> --
>>>> 1.7.9.5
>>>>
>>>
>>> A long story, but the patch is good, although the commit message needs work :-)
>>>
>>> Regards,
>>>
>>> 	Hans
>>
>>
>> Regards,
>> Sylwester
>>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-08-07 17:49     ` Hans Verkuil
  2013-08-07 22:16       ` Laurent Pinchart
  2013-08-08 12:36       ` Andrzej Hajda
@ 2013-09-05 22:33       ` Sylwester Nawrocki
  2013-09-09  9:07         ` Hans Verkuil
  2 siblings, 1 reply; 18+ messages in thread
From: Sylwester Nawrocki @ 2013-09-05 22:33 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Sylwester Nawrocki, linux-media, laurent.pinchart, Kyungmin Park

Hi Hans,

Sorry for putting this on a back burner for a while.

On 08/07/2013 07:49 PM, Hans Verkuil wrote:
> On 08/07/2013 06:49 PM, Sylwester Nawrocki wrote:
>> Hi Hans,
>>
>> On 08/02/2013 03:00 PM, Hans Verkuil wrote:
>>> Hi Sylwester,
>>>
>>> The patch is good, but I have some issues with the commit message itself.
>>
>> Thanks a lot for the detailed explanation, I just wrote this a bit
>> longish changelog to possibly get some feedback and to better understand
>> what is exactly going on. Currently the v4l2-core looks like a racing
>> disaster to me.
>>
>>> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
>>>> As it currently stands this code doesn't protect against any races
>>>> between video device open() and its unregistration. Races could be
>>>> avoided by doing the video_is_registered() check protected by the
>>>> core mutex, while the video device unregistration is also done with
>>>> this mutex held.
>>>
>>> The video_unregister_device() is called completely asynchronously,
>>> particularly in the case of usb drivers. So it was never the goal of
>>> the video_is_registered() to be fool proof, since that isn't possible,
>>> nor is that necessary.
>>>
>>> The goal was that the v4l2 core would use it for the various file
>>> operations and ioctls as a quick check whether the device was unregistered
>>> and to return the correct error. This prevents drivers from having to do
>>> the same thing.
>>
>> OK, I think I just myself used this video_is_registered() flag for some
>> more stuff, by surrounding it with mutex_lock/mutex_unlock and putting
>> more stuff in between, like media_entity_cleanup().
>
> You can't do that, because there are most likely still filehandles open
> or even ioctls being executed. Cleanup happens in the release function(s)
> when the kref goes to 0.

Yeah, a bit late, but I'm finally well aware of that.

>> And this probably led
>> me astray for a while, thinking that video_is_registered() was intended
>> to be used synchronously.
>> For example see fimc_lite_subdev_unregistered() in drivers/media/platform/
>> exynos4-is/fimc-lite.c.
>>
>> But as you said video_is_registered() is fully asynchronous.
>>
>> Actually I'm trying to fix a nasty race between deferred driver probing
>> and video device open(). The problem is that video open() callback can
>> be called after driver remove() callback was invoked.
>
> How is that possible? The video_device_register must be the last thing in
> the probe function. If it succeeds, then the probe must succeed as well.
>
> Note that I now realize that this might fail in the case of multiple device
> nodes being registered. We never had problems with that in the past, but in
> theory you can the race condition you mention in that scenario. The correct
> approach here would probably be to always return 0 in probe() if only some
> of the video_device_register calls fail.
>
> Anyway, assuming that only one device node is created, then I can't see how
> you can get a race condition here. Any open() call will increase the module's
> refcount, making it impossible to unload.

The main issue is that in the exynos4-is driver there are multiple platform
devices (these represent IP blocks that are scattered across various Exynos
SoC revisions). Drivers of this platform devices create subdevs and video
nodes are registered in subdev's .registered() callback. This allows the
drivers to be more modular and self contained. But also during video device
registration proper struct v4l2_device (and through this struct 
media_device)
can be passed so the video nodes are attached to the common media driver.

In probe() of the media driver all subdevs are registered and this triggers
video nodes creation. The media device first registers all platform devices.
Subdevs are created and all video nodes. After that are being registered
sensor subdevs and media links are created. Then all subdev devnodes are
created in a single call. Note this single call is a bit contrary to the
v4l2-sync API concepts.

So there is lots of things that may fail after first video node is 
registered,
and on which open() may be called immediately.

> As far as I can tell, once you call rmmod it should no longer be possible to
> open() an device node whose struct file_operations owner is that module (i.e.
> the owner field of the file_operations struct points to that module). Looking
> at the way fs/char_dev is implemented, that seems to be correctly handled by
> the kernel core.

Yes, that's a good news. There is only still the issue with the "unbind" 
sysfs
attribute. I couldn't see any similar checks done around code 
implementing it.

>> This issue is actually not only related to deferred probing. It can be
>> also triggered by driver module removal or through driver's sysfs "unbind"
>> attribute.
>>
>> Let's assume following scenario:
>>
>> - a driver module is loaded
>> - driver probe is called, it registers video device,
>> - udev opens /dev/video
>> - after mutex_unlock(&videodev_lock); call in v4l2_open() in v4l2-core/
>>    v4l2-dev.c something fails in probe()
>
> And that shouldn't happen. That's the crucial bit: under which scenario does
> this happen for you? If there is a control path where you do create a working
> device node, but the probe fails, then that will indeed cause all sorts of
> problems. But it shouldn't get in that situation (except I think in the case
> of multiple device nodes, which is something I need to think about).

Yes, I'm dealing with multiple device nodes created from within media 
driver's
probe(). As described above, there is quite a few things that could 
fail, even
single sub-driver created multiple nodes for same IP block (capture, 
mem-to-mem).

>> and it unwinds, probe callback
>>    exits and the driver code code calls dev_set_drvdata(dev, NULL); as
>>    shown below.
>>
>> static int really_probe(struct device *dev, struct device_driver *drv)
>> {
>> 	...
>> 	pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
>> 		 drv->bus->name, __func__, drv->name, dev_name(dev));
>> 	...
>> 	if (dev->bus->probe) {
>> 		ret = dev->bus->probe(dev);
>> 		if (ret)
>> 			goto probe_failed;
>> 	} else if (drv->probe) {
>> 		ret = drv->probe(dev);
>> 		if (ret)
>> 			goto probe_failed;
>> 	}
>> 	...
>> 	pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
>> 		 drv->bus->name, __func__, dev_name(dev), drv->name);
>> 	goto done;
>>
>> probe_failed:
>> 	devres_release_all(dev);
>> 	driver_sysfs_remove(dev);
>> 	dev->driver = NULL;
>> 	dev_set_drvdata(dev, NULL);
>>
>> 	...
>> 	ret = 0;
>> done:
>> 	...
>> 	return ret;
>> }
>>
>> Now we get to
>>
>>   	ret = vdev->fops->open(filp);
>>
>> in v4l2_open(). This calls some driver's callback, e.g. something
>> like:
>>
>> static int fimc_lite_open(struct file *file)
>> {
>> 	struct fimc_lite *fimc = video_drvdata(file);
>> 	struct media_entity *me =&fimc->ve.vdev.entity;
>> 	int ret;
>>
>> 	mutex_lock(&fimc->lock);
>> 	if (!video_is_registered(&fimc->ve.vdev)) {
>> 		ret = -ENODEV;
>> 		goto unlock;
>> 	}
>>
>> 	...
>>
>> 	/* Mark video pipeline ending at this video node as in use. */
>> 	if (ret == 0)
>> 		me->use_count++;
>> 	...
>> unlock:
>> 	mutex_unlock(&fimc->lock);
>> 	return ret;
>> }
>>
>> Now what will video_drvdata(file); return ?
>>
>> static inline void *video_drvdata(struct file *file)
>> {
>> 	return video_get_drvdata(video_devdata(file));
>> }
>>
>> static inline void *video_get_drvdata(struct video_device *vdev)
>> {
>> 	return dev_get_drvdata(&vdev->dev);
>> }
>>
>> Yes, so that will be just NULL o_O, due to the dev_set_drvdata(dev, NULL);
>> in really_probe(). drvdata is cleared similarly in __device_release_driver(),
>> right after calling driver's remove handler.
>>
>> Another issue I have is that, e.g. driver/media/platform/exynos4-is/fimc-lite*
>> driver has empty video dev release() callback. It should be implemented
>> in the driver to kfree the whole driver's private data structure where
>> struct video_device is embedded in (struct fimc_lite). But that freeing
>> really needs to be synchronized with driver's remove() call, since there
>> is e.g. freed interrupt which accesses the driver's private data. I can't
>> use kref from struct v4l2_device as that belongs to a different driver.
>> A driver's custom reference counting comes to mind, where vdev->release()
>> and drv->remove() would be decrementing the reference counter. But that
>> seems ugly as hell :/ And it predates devm_*.
>>
>> This is all getting a bit depressing :/ Deferred probing and the
>> asynchronous subdev handling just made those issues more visible, i.e.
>> not very good design of some parts of the v4l2-core.
>
> It's just not clear to me how exactly things go wrong for you. Ping me on irc
> tomorrow and we can discuss it further. I have reworked refcounting in the
> past (at the time it was *really* bad), so perhaps we need to rework it again,
> particularly with video nodes associated with subdevices in the mix, something
> that didn't exist at the time.

Thanks, and sorry for holding on with that for too long.

The main issue as I see it is that we need to track both driver remove() 
and
struct device .release() calls and free resources only when last of them
executes. Data structures which are referenced in fops must not be freed in
remove() and we cannot use dev_get_drvdata() in fops, e.g. not protected 
with
device_lock().

--
Regards,
Sylwester

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-05 22:33       ` Sylwester Nawrocki
@ 2013-09-09  9:07         ` Hans Verkuil
  2013-09-09 10:00           ` Laurent Pinchart
  2013-09-11 13:07           ` Sylwester Nawrocki
  0 siblings, 2 replies; 18+ messages in thread
From: Hans Verkuil @ 2013-09-09  9:07 UTC (permalink / raw)
  To: Sylwester Nawrocki
  Cc: Sylwester Nawrocki, linux-media, laurent.pinchart, Kyungmin Park

On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
> Hi Hans,
> 
> Sorry for putting this on a back burner for a while.
> 
> On 08/07/2013 07:49 PM, Hans Verkuil wrote:
>> On 08/07/2013 06:49 PM, Sylwester Nawrocki wrote:
>>> Hi Hans,
>>>
>>> On 08/02/2013 03:00 PM, Hans Verkuil wrote:
>>>> Hi Sylwester,
>>>>
>>>> The patch is good, but I have some issues with the commit message itself.
>>>
>>> Thanks a lot for the detailed explanation, I just wrote this a bit
>>> longish changelog to possibly get some feedback and to better understand
>>> what is exactly going on. Currently the v4l2-core looks like a racing
>>> disaster to me.
>>>
>>>> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
>>>>> As it currently stands this code doesn't protect against any races
>>>>> between video device open() and its unregistration. Races could be
>>>>> avoided by doing the video_is_registered() check protected by the
>>>>> core mutex, while the video device unregistration is also done with
>>>>> this mutex held.
>>>>
>>>> The video_unregister_device() is called completely asynchronously,
>>>> particularly in the case of usb drivers. So it was never the goal of
>>>> the video_is_registered() to be fool proof, since that isn't possible,
>>>> nor is that necessary.
>>>>
>>>> The goal was that the v4l2 core would use it for the various file
>>>> operations and ioctls as a quick check whether the device was unregistered
>>>> and to return the correct error. This prevents drivers from having to do
>>>> the same thing.
>>>
>>> OK, I think I just myself used this video_is_registered() flag for some
>>> more stuff, by surrounding it with mutex_lock/mutex_unlock and putting
>>> more stuff in between, like media_entity_cleanup().
>>
>> You can't do that, because there are most likely still filehandles open
>> or even ioctls being executed. Cleanup happens in the release function(s)
>> when the kref goes to 0.
> 
> Yeah, a bit late, but I'm finally well aware of that.
> 
>>> And this probably led
>>> me astray for a while, thinking that video_is_registered() was intended
>>> to be used synchronously.
>>> For example see fimc_lite_subdev_unregistered() in drivers/media/platform/
>>> exynos4-is/fimc-lite.c.
>>>
>>> But as you said video_is_registered() is fully asynchronous.
>>>
>>> Actually I'm trying to fix a nasty race between deferred driver probing
>>> and video device open(). The problem is that video open() callback can
>>> be called after driver remove() callback was invoked.
>>
>> How is that possible? The video_device_register must be the last thing in
>> the probe function. If it succeeds, then the probe must succeed as well.
>>
>> Note that I now realize that this might fail in the case of multiple device
>> nodes being registered. We never had problems with that in the past, but in
>> theory you can the race condition you mention in that scenario. The correct
>> approach here would probably be to always return 0 in probe() if only some
>> of the video_device_register calls fail.
>>
>> Anyway, assuming that only one device node is created, then I can't see how
>> you can get a race condition here. Any open() call will increase the module's
>> refcount, making it impossible to unload.
> 
> The main issue is that in the exynos4-is driver there are multiple platform
> devices (these represent IP blocks that are scattered across various Exynos
> SoC revisions). Drivers of this platform devices create subdevs and video
> nodes are registered in subdev's .registered() callback. This allows the
> drivers to be more modular and self contained. But also during video device
> registration proper struct v4l2_device (and through this struct 
> media_device)
> can be passed so the video nodes are attached to the common media driver.
> 
> In probe() of the media driver all subdevs are registered and this triggers
> video nodes creation. The media device first registers all platform devices.
> Subdevs are created and all video nodes. After that are being registered
> sensor subdevs and media links are created. Then all subdev devnodes are
> created in a single call. Note this single call is a bit contrary to the
> v4l2-sync API concepts.

It wouldn't be difficult to add a v4l2_device_register_subdev_node() function.
The v4l2_device_register_subdev_nodes() function predates v4l2-sync, so if
it needs some tweaking to make it behave better with v4l2-sync, then that's
no problem.

> So there is lots of things that may fail after first video node is 
> registered,
> and on which open() may be called immediately.

The only solution I know off-hand to handle this safely is to unregister all
nodes if some fail, but to return 0 in the probe function. If an open() succeeded,
then that will just work until the node is closed, at which point the v4l2_device
release() is called and you can cleanup.

What does alsa do with multiple node creation?

>> As far as I can tell, once you call rmmod it should no longer be possible to
>> open() an device node whose struct file_operations owner is that module (i.e.
>> the owner field of the file_operations struct points to that module). Looking
>> at the way fs/char_dev is implemented, that seems to be correctly handled by
>> the kernel core.
> 
> Yes, that's a good news. There is only still the issue with the "unbind" 
> sysfs
> attribute. I couldn't see any similar checks done around code 
> implementing it.

That seems to me to be a core code issue and should be solved there.

>>> This issue is actually not only related to deferred probing. It can be
>>> also triggered by driver module removal or through driver's sysfs "unbind"
>>> attribute.
>>>
>>> Let's assume following scenario:
>>>
>>> - a driver module is loaded
>>> - driver probe is called, it registers video device,
>>> - udev opens /dev/video
>>> - after mutex_unlock(&videodev_lock); call in v4l2_open() in v4l2-core/
>>>    v4l2-dev.c something fails in probe()
>>
>> And that shouldn't happen. That's the crucial bit: under which scenario does
>> this happen for you? If there is a control path where you do create a working
>> device node, but the probe fails, then that will indeed cause all sorts of
>> problems. But it shouldn't get in that situation (except I think in the case
>> of multiple device nodes, which is something I need to think about).
> 
> Yes, I'm dealing with multiple device nodes created from within media 
> driver's
> probe(). As described above, there is quite a few things that could 
> fail, even
> single sub-driver created multiple nodes for same IP block (capture, 
> mem-to-mem).

Is this 'could fail', or 'I have seen it fail'? I have never seen problems in probe()
with node creation. The only reason I know of why creating a node might fail is
being out-of-memory.

> 
>>> and it unwinds, probe callback
>>>    exits and the driver code code calls dev_set_drvdata(dev, NULL); as
>>>    shown below.
>>>
>>> static int really_probe(struct device *dev, struct device_driver *drv)
>>> {
>>> 	...
>>> 	pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
>>> 		 drv->bus->name, __func__, drv->name, dev_name(dev));
>>> 	...
>>> 	if (dev->bus->probe) {
>>> 		ret = dev->bus->probe(dev);
>>> 		if (ret)
>>> 			goto probe_failed;
>>> 	} else if (drv->probe) {
>>> 		ret = drv->probe(dev);
>>> 		if (ret)
>>> 			goto probe_failed;
>>> 	}
>>> 	...
>>> 	pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
>>> 		 drv->bus->name, __func__, dev_name(dev), drv->name);
>>> 	goto done;
>>>
>>> probe_failed:
>>> 	devres_release_all(dev);
>>> 	driver_sysfs_remove(dev);
>>> 	dev->driver = NULL;
>>> 	dev_set_drvdata(dev, NULL);
>>>
>>> 	...
>>> 	ret = 0;
>>> done:
>>> 	...
>>> 	return ret;
>>> }
>>>
>>> Now we get to
>>>
>>>   	ret = vdev->fops->open(filp);
>>>
>>> in v4l2_open(). This calls some driver's callback, e.g. something
>>> like:
>>>
>>> static int fimc_lite_open(struct file *file)
>>> {
>>> 	struct fimc_lite *fimc = video_drvdata(file);
>>> 	struct media_entity *me =&fimc->ve.vdev.entity;
>>> 	int ret;
>>>
>>> 	mutex_lock(&fimc->lock);
>>> 	if (!video_is_registered(&fimc->ve.vdev)) {
>>> 		ret = -ENODEV;
>>> 		goto unlock;
>>> 	}
>>>
>>> 	...
>>>
>>> 	/* Mark video pipeline ending at this video node as in use. */
>>> 	if (ret == 0)
>>> 		me->use_count++;
>>> 	...
>>> unlock:
>>> 	mutex_unlock(&fimc->lock);
>>> 	return ret;
>>> }
>>>
>>> Now what will video_drvdata(file); return ?
>>>
>>> static inline void *video_drvdata(struct file *file)
>>> {
>>> 	return video_get_drvdata(video_devdata(file));
>>> }
>>>
>>> static inline void *video_get_drvdata(struct video_device *vdev)
>>> {
>>> 	return dev_get_drvdata(&vdev->dev);
>>> }
>>>
>>> Yes, so that will be just NULL o_O, due to the dev_set_drvdata(dev, NULL);
>>> in really_probe(). drvdata is cleared similarly in __device_release_driver(),
>>> right after calling driver's remove handler.
>>>
>>> Another issue I have is that, e.g. driver/media/platform/exynos4-is/fimc-lite*
>>> driver has empty video dev release() callback. It should be implemented
>>> in the driver to kfree the whole driver's private data structure where
>>> struct video_device is embedded in (struct fimc_lite). But that freeing
>>> really needs to be synchronized with driver's remove() call, since there
>>> is e.g. freed interrupt which accesses the driver's private data. I can't
>>> use kref from struct v4l2_device as that belongs to a different driver.
>>> A driver's custom reference counting comes to mind, where vdev->release()
>>> and drv->remove() would be decrementing the reference counter. But that
>>> seems ugly as hell :/ And it predates devm_*.
>>>
>>> This is all getting a bit depressing :/ Deferred probing and the
>>> asynchronous subdev handling just made those issues more visible, i.e.
>>> not very good design of some parts of the v4l2-core.
>>
>> It's just not clear to me how exactly things go wrong for you. Ping me on irc
>> tomorrow and we can discuss it further. I have reworked refcounting in the
>> past (at the time it was *really* bad), so perhaps we need to rework it again,
>> particularly with video nodes associated with subdevices in the mix, something
>> that didn't exist at the time.
> 
> Thanks, and sorry for holding on with that for too long.
> 
> The main issue as I see it is that we need to track both driver remove() 
> and
> struct device .release() calls and free resources only when last of them
> executes. Data structures which are referenced in fops must not be freed in
> remove() and we cannot use dev_get_drvdata() in fops, e.g. not protected 
> with
> device_lock().

You can do all that by returning 0 if probe() was partially successful (i.e.
one or more, but not all, nodes were created successfully) by doing what I
described above. I don't see another way that doesn't introduce a race condition.

That doesn't mean that there isn't one, it's just that I don't know of a better
way of doing this.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-09  9:07         ` Hans Verkuil
@ 2013-09-09 10:00           ` Laurent Pinchart
  2013-09-09 10:07             ` Hans Verkuil
  2013-09-11 13:07           ` Sylwester Nawrocki
  1 sibling, 1 reply; 18+ messages in thread
From: Laurent Pinchart @ 2013-09-09 10:00 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Sylwester Nawrocki, Sylwester Nawrocki, linux-media, Kyungmin Park

Hi Hans,

On Monday 09 September 2013 11:07:43 Hans Verkuil wrote:
> On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
> > On 08/07/2013 07:49 PM, Hans Verkuil wrote:
> >> On 08/07/2013 06:49 PM, Sylwester Nawrocki wrote:
> >>> On 08/02/2013 03:00 PM, Hans Verkuil wrote:
> >>>> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:

[snip]

> > The main issue as I see it is that we need to track both driver remove()
> > and struct device .release() calls and free resources only when last of
> > them executes. Data structures which are referenced in fops must not be
> > freed in remove() and we cannot use dev_get_drvdata() in fops, e.g. not
> > protected with device_lock().
> 
> You can do all that by returning 0 if probe() was partially successful (i.e.
> one or more, but not all, nodes were created successfully) by doing what I
> described above. I don't see another way that doesn't introduce a race
> condition.

But isn't this just plain wrong ? If probing fails, I don't see how returning 
success could be a good idea.

> That doesn't mean that there isn't one, it's just that I don't know of a
> better way of doing this.

We might need support from the device core.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-09 10:00           ` Laurent Pinchart
@ 2013-09-09 10:07             ` Hans Verkuil
  2013-09-09 10:10               ` Laurent Pinchart
  0 siblings, 1 reply; 18+ messages in thread
From: Hans Verkuil @ 2013-09-09 10:07 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sylwester Nawrocki, Sylwester Nawrocki, linux-media, Kyungmin Park

On 09/09/2013 12:00 PM, Laurent Pinchart wrote:
> Hi Hans,
> 
> On Monday 09 September 2013 11:07:43 Hans Verkuil wrote:
>> On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
>>> On 08/07/2013 07:49 PM, Hans Verkuil wrote:
>>>> On 08/07/2013 06:49 PM, Sylwester Nawrocki wrote:
>>>>> On 08/02/2013 03:00 PM, Hans Verkuil wrote:
>>>>>> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
> 
> [snip]
> 
>>> The main issue as I see it is that we need to track both driver remove()
>>> and struct device .release() calls and free resources only when last of
>>> them executes. Data structures which are referenced in fops must not be
>>> freed in remove() and we cannot use dev_get_drvdata() in fops, e.g. not
>>> protected with device_lock().
>>
>> You can do all that by returning 0 if probe() was partially successful (i.e.
>> one or more, but not all, nodes were created successfully) by doing what I
>> described above. I don't see another way that doesn't introduce a race
>> condition.
> 
> But isn't this just plain wrong ? If probing fails, I don't see how returning 
> success could be a good idea.

Well, the nodes that are created are working fine. So it's partially OK :-)

That said, yes it would be better if it could safely clean up and return an error.
But it is better than returning an error and introducing a race condition.

>> That doesn't mean that there isn't one, it's just that I don't know of a
>> better way of doing this.
> 
> We might need support from the device core.
> 

I do come back to my main question: has anyone actually experienced this error in a
realistic scenario? Other than in very low-memory situations I cannot imagine this
happening. I'm not sure whether you want to spend a lot of time trying to fix this
all perfectly. That's why I am suggesting just unregistering everything and returning
0 in probe(). Not ideal, but at least it's safe (as far as I can tell).

Regards,

	Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-09 10:07             ` Hans Verkuil
@ 2013-09-09 10:10               ` Laurent Pinchart
  2013-09-09 10:17                 ` Hans Verkuil
  0 siblings, 1 reply; 18+ messages in thread
From: Laurent Pinchart @ 2013-09-09 10:10 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Sylwester Nawrocki, Sylwester Nawrocki, linux-media, Kyungmin Park

Hi Hans,

On Monday 09 September 2013 12:07:18 Hans Verkuil wrote:
> On 09/09/2013 12:00 PM, Laurent Pinchart wrote:
> > On Monday 09 September 2013 11:07:43 Hans Verkuil wrote:
> >> On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
> >
> > [snip]
> > 
> >>> The main issue as I see it is that we need to track both driver remove()
> >>> and struct device .release() calls and free resources only when last of
> >>> them executes. Data structures which are referenced in fops must not be
> >>> freed in remove() and we cannot use dev_get_drvdata() in fops, e.g. not
> >>> protected with device_lock().
> >> 
> >> You can do all that by returning 0 if probe() was partially successful
> >> (i.e. one or more, but not all, nodes were created successfully) by
> >> doing what I described above. I don't see another way that doesn't
> >> introduce a race condition.
> > 
> > But isn't this just plain wrong ? If probing fails, I don't see how
> > returning success could be a good idea.
> 
> Well, the nodes that are created are working fine. So it's partially OK :-)
> 
> That said, yes it would be better if it could safely clean up and return an
> error. But it is better than returning an error and introducing a race
> condition.
>
> >> That doesn't mean that there isn't one, it's just that I don't know of a
> >> better way of doing this.
> > 
> > We might need support from the device core.
> 
> I do come back to my main question: has anyone actually experienced this
> error in a realistic scenario? Other than in very low-memory situations I
> cannot imagine this happening.

What about running out of minors, which could very well happen with subdev 
nodes in complex SoCs ?

> I'm not sure whether you want to spend a lot of time trying to fix this all
> perfectly. That's why I am suggesting just unregistering everything and
> returning 0 in probe(). Not ideal, but at least it's safe (as far as I can
> tell).

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-09 10:10               ` Laurent Pinchart
@ 2013-09-09 10:17                 ` Hans Verkuil
  2013-09-09 10:24                   ` Laurent Pinchart
  0 siblings, 1 reply; 18+ messages in thread
From: Hans Verkuil @ 2013-09-09 10:17 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sylwester Nawrocki, Sylwester Nawrocki, linux-media, Kyungmin Park

On 09/09/2013 12:10 PM, Laurent Pinchart wrote:
> Hi Hans,
> 
> On Monday 09 September 2013 12:07:18 Hans Verkuil wrote:
>> On 09/09/2013 12:00 PM, Laurent Pinchart wrote:
>>> On Monday 09 September 2013 11:07:43 Hans Verkuil wrote:
>>>> On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
>>>
>>> [snip]
>>>
>>>>> The main issue as I see it is that we need to track both driver remove()
>>>>> and struct device .release() calls and free resources only when last of
>>>>> them executes. Data structures which are referenced in fops must not be
>>>>> freed in remove() and we cannot use dev_get_drvdata() in fops, e.g. not
>>>>> protected with device_lock().
>>>>
>>>> You can do all that by returning 0 if probe() was partially successful
>>>> (i.e. one or more, but not all, nodes were created successfully) by
>>>> doing what I described above. I don't see another way that doesn't
>>>> introduce a race condition.
>>>
>>> But isn't this just plain wrong ? If probing fails, I don't see how
>>> returning success could be a good idea.
>>
>> Well, the nodes that are created are working fine. So it's partially OK :-)
>>
>> That said, yes it would be better if it could safely clean up and return an
>> error. But it is better than returning an error and introducing a race
>> condition.
>>
>>>> That doesn't mean that there isn't one, it's just that I don't know of a
>>>> better way of doing this.
>>>
>>> We might need support from the device core.
>>
>> I do come back to my main question: has anyone actually experienced this
>> error in a realistic scenario? Other than in very low-memory situations I
>> cannot imagine this happening.
> 
> What about running out of minors, which could very well happen with subdev 
> nodes in complex SoCs ?

Is that really realistic? What's the worst-case SoC we have in terms of
device nodes? Frankly, if this might happen then we should allow for more
minors or make the minor allocation completely dynamic.

If you run into this situation then you have bigger problems than a potential
race condition.

Regards,

	Hans


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-09 10:17                 ` Hans Verkuil
@ 2013-09-09 10:24                   ` Laurent Pinchart
  2013-09-09 10:37                     ` Hans Verkuil
  0 siblings, 1 reply; 18+ messages in thread
From: Laurent Pinchart @ 2013-09-09 10:24 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Sylwester Nawrocki, Sylwester Nawrocki, linux-media, Kyungmin Park

Hi Hans,

On Monday 09 September 2013 12:17:34 Hans Verkuil wrote:
> On 09/09/2013 12:10 PM, Laurent Pinchart wrote:
> > On Monday 09 September 2013 12:07:18 Hans Verkuil wrote:
> >> On 09/09/2013 12:00 PM, Laurent Pinchart wrote:
> >>> On Monday 09 September 2013 11:07:43 Hans Verkuil wrote:
> >>>> On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
> >>> [snip]
> >>> 
> >>>>> The main issue as I see it is that we need to track both driver
> >>>>> remove() and struct device .release() calls and free resources only
> >>>>> when last of them executes. Data structures which are referenced in
> >>>>> fops must not be freed in remove() and we cannot use dev_get_drvdata()
> >>>>> in fops, e.g. not protected with device_lock().
> >>>> 
> >>>> You can do all that by returning 0 if probe() was partially successful
> >>>> (i.e. one or more, but not all, nodes were created successfully) by
> >>>> doing what I described above. I don't see another way that doesn't
> >>>> introduce a race condition.
> >>> 
> >>> But isn't this just plain wrong ? If probing fails, I don't see how
> >>> returning success could be a good idea.
> >> 
> >> Well, the nodes that are created are working fine. So it's partially OK
> >> :-)
> >> 
> >> That said, yes it would be better if it could safely clean up and return
> >> an error. But it is better than returning an error and introducing a race
> >> condition.
> >> 
> >>>> That doesn't mean that there isn't one, it's just that I don't know of
> >>>> a better way of doing this.
> >>> 
> >>> We might need support from the device core.
> >> 
> >> I do come back to my main question: has anyone actually experienced this
> >> error in a realistic scenario? Other than in very low-memory situations I
> >> cannot imagine this happening.
> > 
> > What about running out of minors, which could very well happen with subdev
> > nodes in complex SoCs ?
> 
> Is that really realistic? What's the worst-case SoC we have in terms of
> device nodes? Frankly, if this might happen then we should allow for more
> minors or make the minor allocation completely dynamic.

For the 4 VSP1 instances on the R-Car H2, I need 33 video nodes and 65 (if I'm 
not mistaken) subdev nodes. That doesn't include the camera interface.

On a side note, this seems to indicate that the subdev API should probably 
move to the /dev/media device node. That's something else to discuss.

> If you run into this situation then you have bigger problems than a
> potential race condition.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-09 10:24                   ` Laurent Pinchart
@ 2013-09-09 10:37                     ` Hans Verkuil
  2013-09-09 10:48                       ` Hans Verkuil
  0 siblings, 1 reply; 18+ messages in thread
From: Hans Verkuil @ 2013-09-09 10:37 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sylwester Nawrocki, Sylwester Nawrocki, linux-media, Kyungmin Park

On 09/09/2013 12:24 PM, Laurent Pinchart wrote:
> Hi Hans,
> 
> On Monday 09 September 2013 12:17:34 Hans Verkuil wrote:
>> On 09/09/2013 12:10 PM, Laurent Pinchart wrote:
>>> On Monday 09 September 2013 12:07:18 Hans Verkuil wrote:
>>>> On 09/09/2013 12:00 PM, Laurent Pinchart wrote:
>>>>> On Monday 09 September 2013 11:07:43 Hans Verkuil wrote:
>>>>>> On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
>>>>> [snip]
>>>>>
>>>>>>> The main issue as I see it is that we need to track both driver
>>>>>>> remove() and struct device .release() calls and free resources only
>>>>>>> when last of them executes. Data structures which are referenced in
>>>>>>> fops must not be freed in remove() and we cannot use dev_get_drvdata()
>>>>>>> in fops, e.g. not protected with device_lock().
>>>>>>
>>>>>> You can do all that by returning 0 if probe() was partially successful
>>>>>> (i.e. one or more, but not all, nodes were created successfully) by
>>>>>> doing what I described above. I don't see another way that doesn't
>>>>>> introduce a race condition.
>>>>>
>>>>> But isn't this just plain wrong ? If probing fails, I don't see how
>>>>> returning success could be a good idea.
>>>>
>>>> Well, the nodes that are created are working fine. So it's partially OK
>>>> :-)
>>>>
>>>> That said, yes it would be better if it could safely clean up and return
>>>> an error. But it is better than returning an error and introducing a race
>>>> condition.
>>>>
>>>>>> That doesn't mean that there isn't one, it's just that I don't know of
>>>>>> a better way of doing this.
>>>>>
>>>>> We might need support from the device core.
>>>>
>>>> I do come back to my main question: has anyone actually experienced this
>>>> error in a realistic scenario? Other than in very low-memory situations I
>>>> cannot imagine this happening.
>>>
>>> What about running out of minors, which could very well happen with subdev
>>> nodes in complex SoCs ?
>>
>> Is that really realistic? What's the worst-case SoC we have in terms of
>> device nodes? Frankly, if this might happen then we should allow for more
>> minors or make the minor allocation completely dynamic.
> 
> For the 4 VSP1 instances on the R-Car H2, I need 33 video nodes and 65 (if I'm 
> not mistaken) subdev nodes. That doesn't include the camera interface.

So that leaves 158 free minors. That's a lot of webcams that can be attached :-)

> On a side note, this seems to indicate that the subdev API should probably 
> move to the /dev/media device node. That's something else to discuss.

I have 70 tty nodes in /dev. Just because there are a lot of them doesn't mean that
we have to merge them somehow. There may be other arguments for changing how we
handle them, but 'there are a lot of them' isn't a good argument, IMHO.

Regards,

	Hans

>> If you run into this situation then you have bigger problems than a
>> potential race condition.
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-09 10:37                     ` Hans Verkuil
@ 2013-09-09 10:48                       ` Hans Verkuil
  0 siblings, 0 replies; 18+ messages in thread
From: Hans Verkuil @ 2013-09-09 10:48 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Sylwester Nawrocki, Sylwester Nawrocki, linux-media, Kyungmin Park

On 09/09/2013 12:37 PM, Hans Verkuil wrote:
> On 09/09/2013 12:24 PM, Laurent Pinchart wrote:
>> Hi Hans,
>>
>> On Monday 09 September 2013 12:17:34 Hans Verkuil wrote:
>>> On 09/09/2013 12:10 PM, Laurent Pinchart wrote:
>>>> On Monday 09 September 2013 12:07:18 Hans Verkuil wrote:
>>>>> On 09/09/2013 12:00 PM, Laurent Pinchart wrote:
>>>>>> On Monday 09 September 2013 11:07:43 Hans Verkuil wrote:
>>>>>>> On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
>>>>>> [snip]
>>>>>>
>>>>>>>> The main issue as I see it is that we need to track both driver
>>>>>>>> remove() and struct device .release() calls and free resources only
>>>>>>>> when last of them executes. Data structures which are referenced in
>>>>>>>> fops must not be freed in remove() and we cannot use dev_get_drvdata()
>>>>>>>> in fops, e.g. not protected with device_lock().
>>>>>>>
>>>>>>> You can do all that by returning 0 if probe() was partially successful
>>>>>>> (i.e. one or more, but not all, nodes were created successfully) by
>>>>>>> doing what I described above. I don't see another way that doesn't
>>>>>>> introduce a race condition.
>>>>>>
>>>>>> But isn't this just plain wrong ? If probing fails, I don't see how
>>>>>> returning success could be a good idea.
>>>>>
>>>>> Well, the nodes that are created are working fine. So it's partially OK
>>>>> :-)
>>>>>
>>>>> That said, yes it would be better if it could safely clean up and return
>>>>> an error. But it is better than returning an error and introducing a race
>>>>> condition.
>>>>>
>>>>>>> That doesn't mean that there isn't one, it's just that I don't know of
>>>>>>> a better way of doing this.
>>>>>>
>>>>>> We might need support from the device core.
>>>>>
>>>>> I do come back to my main question: has anyone actually experienced this
>>>>> error in a realistic scenario? Other than in very low-memory situations I
>>>>> cannot imagine this happening.
>>>>
>>>> What about running out of minors, which could very well happen with subdev
>>>> nodes in complex SoCs ?
>>>
>>> Is that really realistic? What's the worst-case SoC we have in terms of
>>> device nodes? Frankly, if this might happen then we should allow for more
>>> minors or make the minor allocation completely dynamic.
>>
>> For the 4 VSP1 instances on the R-Car H2, I need 33 video nodes and 65 (if I'm 
>> not mistaken) subdev nodes. That doesn't include the camera interface.
> 
> So that leaves 158 free minors. That's a lot of webcams that can be attached :-)

Just for the record: I would have no objection whatsoever if we increase
VIDEO_NUM_DEVICES to 512.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-09  9:07         ` Hans Verkuil
  2013-09-09 10:00           ` Laurent Pinchart
@ 2013-09-11 13:07           ` Sylwester Nawrocki
  2013-09-11 14:01             ` Hans Verkuil
  1 sibling, 1 reply; 18+ messages in thread
From: Sylwester Nawrocki @ 2013-09-11 13:07 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Sylwester Nawrocki, linux-media, laurent.pinchart, Kyungmin Park,
	Mark Brown, Andrzej Hajda

On 09/09/2013 11:07 AM, Hans Verkuil wrote:
> On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
>> On 08/07/2013 07:49 PM, Hans Verkuil wrote:
>>> On 08/07/2013 06:49 PM, Sylwester Nawrocki wrote:
>>>> On 08/02/2013 03:00 PM, Hans Verkuil wrote:
>>>>> Hi Sylwester,
>>>>>
>>>>> The patch is good, but I have some issues with the commit message itself.
>>>>
>>>> Thanks a lot for the detailed explanation, I just wrote this a bit
>>>> longish changelog to possibly get some feedback and to better understand
>>>> what is exactly going on. Currently the v4l2-core looks like a racing
>>>> disaster to me.
>>>>
>>>>> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
>>>>>> As it currently stands this code doesn't protect against any races
>>>>>> between video device open() and its unregistration. Races could be
>>>>>> avoided by doing the video_is_registered() check protected by the
>>>>>> core mutex, while the video device unregistration is also done with
>>>>>> this mutex held.
>>>>>
>>>>> The video_unregister_device() is called completely asynchronously,
>>>>> particularly in the case of usb drivers. So it was never the goal of
>>>>> the video_is_registered() to be fool proof, since that isn't possible,
>>>>> nor is that necessary.
>>>>>
>>>>> The goal was that the v4l2 core would use it for the various file
>>>>> operations and ioctls as a quick check whether the device was unregistered
>>>>> and to return the correct error. This prevents drivers from having to do
>>>>> the same thing.
>>>>
>>>> OK, I think I just myself used this video_is_registered() flag for some
>>>> more stuff, by surrounding it with mutex_lock/mutex_unlock and putting
>>>> more stuff in between, like media_entity_cleanup().
>>>
>>> You can't do that, because there are most likely still filehandles open
>>> or even ioctls being executed. Cleanup happens in the release function(s)
>>> when the kref goes to 0.
>>
>> Yeah, a bit late, but I'm finally well aware of that.
>>
>>>> And this probably led
>>>> me astray for a while, thinking that video_is_registered() was intended
>>>> to be used synchronously.
>>>> For example see fimc_lite_subdev_unregistered() in drivers/media/platform/
>>>> exynos4-is/fimc-lite.c.
>>>>
>>>> But as you said video_is_registered() is fully asynchronous.
>>>>
>>>> Actually I'm trying to fix a nasty race between deferred driver probing
>>>> and video device open(). The problem is that video open() callback can
>>>> be called after driver remove() callback was invoked.
>>>
>>> How is that possible? The video_device_register must be the last thing in
>>> the probe function. If it succeeds, then the probe must succeed as well.
>>>
>>> Note that I now realize that this might fail in the case of multiple device
>>> nodes being registered. We never had problems with that in the past, but in
>>> theory you can the race condition you mention in that scenario. The correct
>>> approach here would probably be to always return 0 in probe() if only some
>>> of the video_device_register calls fail.
>>>
>>> Anyway, assuming that only one device node is created, then I can't see how
>>> you can get a race condition here. Any open() call will increase the module's
>>> refcount, making it impossible to unload.
>>
>> The main issue is that in the exynos4-is driver there are multiple platform
>> devices (these represent IP blocks that are scattered across various Exynos
>> SoC revisions). Drivers of this platform devices create subdevs and video
>> nodes are registered in subdev's .registered() callback. This allows the
>> drivers to be more modular and self contained. But also during video device
>> registration proper struct v4l2_device (and through this struct 
>> media_device)
>> can be passed so the video nodes are attached to the common media driver.
>>
>> In probe() of the media driver all subdevs are registered and this triggers
>> video nodes creation. The media device first registers all platform devices.
>> Subdevs are created and all video nodes. After that are being registered
>> sensor subdevs and media links are created. Then all subdev devnodes are
>> created in a single call. Note this single call is a bit contrary to the
>> v4l2-sync API concepts.
> 
> It wouldn't be difficult to add a v4l2_device_register_subdev_node() function.
> The v4l2_device_register_subdev_nodes() function predates v4l2-sync, so if
> it needs some tweaking to make it behave better with v4l2-sync, then that's
> no problem.

Yeah, I think it will need to be added sooner or later.

>> So there is lots of things that may fail after first video node is 
>> registered, and on which open() may be called immediately.
> 
> The only solution I know off-hand to handle this safely is to unregister all
> nodes if some fail, but to return 0 in the probe function. If an open() succeeded,
> then that will just work until the node is closed, at which point the v4l2_device
> release() is called and you can cleanup.

Another solution would be to properly implement struct video_device::release()
callback and in video device open() do video_is_registered() check while
holding the driver private video device lock. Also video_unregister_device()
would need to be called while holding the video device lock.
Then some devm_* calls would need to be removed from drivers, as we wanted to
have driver private data structure released on in video_device::release(), not
after failed driver probe() or after remove() call.

As a side note, in some previous posts a had been confusing struct device embedded 
in struct video_device with struct device of a physical device, e.g. associated
with struct platform_device. As video_get_drvdata() uses the former it should be
safe to  use this helper in open(), even without holding the driver private lock.

> What does alsa do with multiple node creation?

Not sure, maybe Mark could shed some light on that (I've added him at Cc).

>>> As far as I can tell, once you call rmmod it should no longer be possible to
>>> open() an device node whose struct file_operations owner is that module (i.e.
>>> the owner field of the file_operations struct points to that module). Looking
>>> at the way fs/char_dev is implemented, that seems to be correctly handled by
>>> the kernel core.
>>
>> Yes, that's a good news. There is only still the issue with the "unbind" 
>> sysfs attribute. I couldn't see any similar checks done around code 
>> implementing it.
> 
> That seems to me to be a core code issue and should be solved there.

Yeah, as I noted below, it has been tried already [1].

>>>> This issue is actually not only related to deferred probing. It can be
>>>> also triggered by driver module removal or through driver's sysfs "unbind"
>>>> attribute.
>>>>
>>>> Let's assume following scenario:
>>>>
>>>> - a driver module is loaded
>>>> - driver probe is called, it registers video device,
>>>> - udev opens /dev/video
>>>> - after mutex_unlock(&videodev_lock); call in v4l2_open() in v4l2-core/
>>>>    v4l2-dev.c something fails in probe()
>>>
>>> And that shouldn't happen. That's the crucial bit: under which scenario does
>>> this happen for you? If there is a control path where you do create a working
>>> device node, but the probe fails, then that will indeed cause all sorts of
>>> problems. But it shouldn't get in that situation (except I think in the case
>>> of multiple device nodes, which is something I need to think about).
>>
>> Yes, I'm dealing with multiple device nodes created from within media 
>> driver's
>> probe(). As described above, there is quite a few things that could 
>> fail, even
>> single sub-driver created multiple nodes for same IP block (capture, 
>> mem-to-mem).
> 
> Is this 'could fail', or 'I have seen it fail'? I have never seen problems in probe()
> with node creation. The only reason I know of why creating a node might fail is
> being out-of-memory.

In my case it was top level driver that was triggering device node creation
in its probe() and if, e.g. some of sub-device's driver was missing, it called,
through subdev internal unregistered(), op video_unregister_device(), but also
media_entity_cleanup() which seemed to be the source of trouble.

So if we ignore the sysfs "unbind" issue the problem might not be that severe.
Not sure if we should. Nevertheless most of drivers crash kernel currently
when "unbind" is used on them. E.g. I could unbind unregister the system power
management device (PMIC) driver, the regulator core warned about regulators 
in use being removed and then cpufreq killed the system completely due to a
dangling pointer to a regulator object.

>>>> and it unwinds, probe callback
>>>>    exits and the driver code code calls dev_set_drvdata(dev, NULL); as
>>>>    shown below.
>>>>
>>>> static int really_probe(struct device *dev, struct device_driver *drv)
>>>> {
>>>> 	...
>>>> 	pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
>>>> 		 drv->bus->name, __func__, drv->name, dev_name(dev));
>>>> 	...
>>>> 	if (dev->bus->probe) {
>>>> 		ret = dev->bus->probe(dev);
>>>> 		if (ret)
>>>> 			goto probe_failed;
>>>> 	} else if (drv->probe) {
>>>> 		ret = drv->probe(dev);
>>>> 		if (ret)
>>>> 			goto probe_failed;
>>>> 	}
>>>> 	...
>>>> 	pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
>>>> 		 drv->bus->name, __func__, dev_name(dev), drv->name);
>>>> 	goto done;
>>>>
>>>> probe_failed:
>>>> 	devres_release_all(dev);
>>>> 	driver_sysfs_remove(dev);
>>>> 	dev->driver = NULL;
>>>> 	dev_set_drvdata(dev, NULL);
>>>>
>>>> 	...
>>>> 	ret = 0;
>>>> done:
>>>> 	...
>>>> 	return ret;
>>>> }
>>>>
>>>> Now we get to
>>>>
>>>>   	ret = vdev->fops->open(filp);
>>>>
>>>> in v4l2_open(). This calls some driver's callback, e.g. something
>>>> like:
>>>>
>>>> static int fimc_lite_open(struct file *file)
>>>> {
>>>> 	struct fimc_lite *fimc = video_drvdata(file);
>>>> 	struct media_entity *me =&fimc->ve.vdev.entity;
>>>> 	int ret;
>>>>
>>>> 	mutex_lock(&fimc->lock);
>>>> 	if (!video_is_registered(&fimc->ve.vdev)) {
>>>> 		ret = -ENODEV;
>>>> 		goto unlock;
>>>> 	}
>>>>
>>>> 	...
>>>>
>>>> 	/* Mark video pipeline ending at this video node as in use. */
>>>> 	if (ret == 0)
>>>> 		me->use_count++;
>>>> 	...
>>>> unlock:
>>>> 	mutex_unlock(&fimc->lock);
>>>> 	return ret;
>>>> }
>>>>
>>>> Now what will video_drvdata(file); return ?
>>>>
>>>> static inline void *video_drvdata(struct file *file)
>>>> {
>>>> 	return video_get_drvdata(video_devdata(file));
>>>> }
>>>>
>>>> static inline void *video_get_drvdata(struct video_device *vdev)
>>>> {
>>>> 	return dev_get_drvdata(&vdev->dev);
>>>> }
>>>>
>>>> Yes, so that will be just NULL o_O, due to the dev_set_drvdata(dev, NULL);
>>>> in really_probe(). drvdata is cleared similarly in __device_release_driver(),
>>>> right after calling driver's remove handler.
>>>>
>>>> Another issue I have is that, e.g. driver/media/platform/exynos4-is/fimc-lite*
>>>> driver has empty video dev release() callback. It should be implemented
>>>> in the driver to kfree the whole driver's private data structure where
>>>> struct video_device is embedded in (struct fimc_lite). But that freeing
>>>> really needs to be synchronized with driver's remove() call, since there
>>>> is e.g. freed interrupt which accesses the driver's private data. I can't
>>>> use kref from struct v4l2_device as that belongs to a different driver.
>>>> A driver's custom reference counting comes to mind, where vdev->release()
>>>> and drv->remove() would be decrementing the reference counter. But that
>>>> seems ugly as hell :/ And it predates devm_*.
>>>>
>>>> This is all getting a bit depressing :/ Deferred probing and the
>>>> asynchronous subdev handling just made those issues more visible, i.e.
>>>> not very good design of some parts of the v4l2-core.
>>>
>>> It's just not clear to me how exactly things go wrong for you. Ping me on irc
>>> tomorrow and we can discuss it further. I have reworked refcounting in the
>>> past (at the time it was *really* bad), so perhaps we need to rework it again,
>>> particularly with video nodes associated with subdevices in the mix, something
>>> that didn't exist at the time.
>>
>> Thanks, and sorry for holding on with that for too long.
>>
>> The main issue as I see it is that we need to track both driver remove() 
>> and
>> struct device .release() calls and free resources only when last of them
>> executes. Data structures which are referenced in fops must not be freed in
>> remove() and we cannot use dev_get_drvdata() in fops, e.g. not protected 
>> with device_lock().
> 
> You can do all that by returning 0 if probe() was partially successful (i.e.
> one or more, but not all, nodes were created successfully) by doing what I
> described above. I don't see another way that doesn't introduce a race condition.

Yes, that could be a work around for the problem. However it doesn't seem right 
to assume at the subsystem level. For example errors like EPROBE_DEFER should 
be propagated transparently by drivers so the driver core retries probing. 

It is all easy to trigger with the sysfs unbind feature. Regarding fixing 
it at the driver core, it seems there have been brave that tried it already 
[1]. :) It's surprising how poorly this feature is handled by many drivers or 
even subsystems.

> That doesn't mean that there isn't one, it's just that I don't know of a better
> way of doing this.
> 
> Regards,
> 
> 	Hans

[1] https://lkml.org/lkml/2011/6/13/185 

--
Regards,
Sylwester

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-11 13:07           ` Sylwester Nawrocki
@ 2013-09-11 14:01             ` Hans Verkuil
  2013-09-12 10:19               ` Sylwester Nawrocki
  0 siblings, 1 reply; 18+ messages in thread
From: Hans Verkuil @ 2013-09-11 14:01 UTC (permalink / raw)
  To: Sylwester Nawrocki
  Cc: Sylwester Nawrocki, linux-media, laurent.pinchart, Kyungmin Park,
	Mark Brown, Andrzej Hajda

Hi Sylwester,

On 09/11/2013 03:07 PM, Sylwester Nawrocki wrote:
> On 09/09/2013 11:07 AM, Hans Verkuil wrote:
>> On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
>>> On 08/07/2013 07:49 PM, Hans Verkuil wrote:
>>>> On 08/07/2013 06:49 PM, Sylwester Nawrocki wrote:
>>>>> On 08/02/2013 03:00 PM, Hans Verkuil wrote:
>>>>>> Hi Sylwester,
>>>>>>
>>>>>> The patch is good, but I have some issues with the commit message itself.
>>>>>
>>>>> Thanks a lot for the detailed explanation, I just wrote this a bit
>>>>> longish changelog to possibly get some feedback and to better understand
>>>>> what is exactly going on. Currently the v4l2-core looks like a racing
>>>>> disaster to me.
>>>>>
>>>>>> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
>>>>>>> As it currently stands this code doesn't protect against any races
>>>>>>> between video device open() and its unregistration. Races could be
>>>>>>> avoided by doing the video_is_registered() check protected by the
>>>>>>> core mutex, while the video device unregistration is also done with
>>>>>>> this mutex held.
>>>>>>
>>>>>> The video_unregister_device() is called completely asynchronously,
>>>>>> particularly in the case of usb drivers. So it was never the goal of
>>>>>> the video_is_registered() to be fool proof, since that isn't possible,
>>>>>> nor is that necessary.
>>>>>>
>>>>>> The goal was that the v4l2 core would use it for the various file
>>>>>> operations and ioctls as a quick check whether the device was unregistered
>>>>>> and to return the correct error. This prevents drivers from having to do
>>>>>> the same thing.
>>>>>
>>>>> OK, I think I just myself used this video_is_registered() flag for some
>>>>> more stuff, by surrounding it with mutex_lock/mutex_unlock and putting
>>>>> more stuff in between, like media_entity_cleanup().
>>>>
>>>> You can't do that, because there are most likely still filehandles open
>>>> or even ioctls being executed. Cleanup happens in the release function(s)
>>>> when the kref goes to 0.
>>>
>>> Yeah, a bit late, but I'm finally well aware of that.
>>>
>>>>> And this probably led
>>>>> me astray for a while, thinking that video_is_registered() was intended
>>>>> to be used synchronously.
>>>>> For example see fimc_lite_subdev_unregistered() in drivers/media/platform/
>>>>> exynos4-is/fimc-lite.c.
>>>>>
>>>>> But as you said video_is_registered() is fully asynchronous.
>>>>>
>>>>> Actually I'm trying to fix a nasty race between deferred driver probing
>>>>> and video device open(). The problem is that video open() callback can
>>>>> be called after driver remove() callback was invoked.
>>>>
>>>> How is that possible? The video_device_register must be the last thing in
>>>> the probe function. If it succeeds, then the probe must succeed as well.
>>>>
>>>> Note that I now realize that this might fail in the case of multiple device
>>>> nodes being registered. We never had problems with that in the past, but in
>>>> theory you can the race condition you mention in that scenario. The correct
>>>> approach here would probably be to always return 0 in probe() if only some
>>>> of the video_device_register calls fail.
>>>>
>>>> Anyway, assuming that only one device node is created, then I can't see how
>>>> you can get a race condition here. Any open() call will increase the module's
>>>> refcount, making it impossible to unload.
>>>
>>> The main issue is that in the exynos4-is driver there are multiple platform
>>> devices (these represent IP blocks that are scattered across various Exynos
>>> SoC revisions). Drivers of this platform devices create subdevs and video
>>> nodes are registered in subdev's .registered() callback. This allows the
>>> drivers to be more modular and self contained. But also during video device
>>> registration proper struct v4l2_device (and through this struct 
>>> media_device)
>>> can be passed so the video nodes are attached to the common media driver.
>>>
>>> In probe() of the media driver all subdevs are registered and this triggers
>>> video nodes creation. The media device first registers all platform devices.
>>> Subdevs are created and all video nodes. After that are being registered
>>> sensor subdevs and media links are created. Then all subdev devnodes are
>>> created in a single call. Note this single call is a bit contrary to the
>>> v4l2-sync API concepts.
>>
>> It wouldn't be difficult to add a v4l2_device_register_subdev_node() function.
>> The v4l2_device_register_subdev_nodes() function predates v4l2-sync, so if
>> it needs some tweaking to make it behave better with v4l2-sync, then that's
>> no problem.
> 
> Yeah, I think it will need to be added sooner or later.
> 
>>> So there is lots of things that may fail after first video node is 
>>> registered, and on which open() may be called immediately.
>>
>> The only solution I know off-hand to handle this safely is to unregister all
>> nodes if some fail, but to return 0 in the probe function. If an open() succeeded,
>> then that will just work until the node is closed, at which point the v4l2_device
>> release() is called and you can cleanup.
> 
> Another solution would be to properly implement struct video_device::release()
> callback and in video device open() do video_is_registered() check while
> holding the driver private video device lock. Also video_unregister_device()
> would need to be called while holding the video device lock.

How would that help?

> Then some devm_* calls would need to be removed from drivers, as we wanted to
> have driver private data structure released on in video_device::release(), not
> after failed driver probe() or after remove() call.
> 
> As a side note, in some previous posts a had been confusing struct device embedded 
> in struct video_device with struct device of a physical device, e.g. associated
> with struct platform_device. As video_get_drvdata() uses the former it should be
> safe to  use this helper in open(), even without holding the driver private lock.
> 
>> What does alsa do with multiple node creation?
> 
> Not sure, maybe Mark could shed some light on that (I've added him at Cc).
> 
>>>> As far as I can tell, once you call rmmod it should no longer be possible to
>>>> open() an device node whose struct file_operations owner is that module (i.e.
>>>> the owner field of the file_operations struct points to that module). Looking
>>>> at the way fs/char_dev is implemented, that seems to be correctly handled by
>>>> the kernel core.
>>>
>>> Yes, that's a good news. There is only still the issue with the "unbind" 
>>> sysfs attribute. I couldn't see any similar checks done around code 
>>> implementing it.
>>
>> That seems to me to be a core code issue and should be solved there.
> 
> Yeah, as I noted below, it has been tried already [1].
> 
>>>>> This issue is actually not only related to deferred probing. It can be
>>>>> also triggered by driver module removal or through driver's sysfs "unbind"
>>>>> attribute.
>>>>>
>>>>> Let's assume following scenario:
>>>>>
>>>>> - a driver module is loaded
>>>>> - driver probe is called, it registers video device,
>>>>> - udev opens /dev/video
>>>>> - after mutex_unlock(&videodev_lock); call in v4l2_open() in v4l2-core/
>>>>>    v4l2-dev.c something fails in probe()
>>>>
>>>> And that shouldn't happen. That's the crucial bit: under which scenario does
>>>> this happen for you? If there is a control path where you do create a working
>>>> device node, but the probe fails, then that will indeed cause all sorts of
>>>> problems. But it shouldn't get in that situation (except I think in the case
>>>> of multiple device nodes, which is something I need to think about).
>>>
>>> Yes, I'm dealing with multiple device nodes created from within media 
>>> driver's
>>> probe(). As described above, there is quite a few things that could 
>>> fail, even
>>> single sub-driver created multiple nodes for same IP block (capture, 
>>> mem-to-mem).
>>
>> Is this 'could fail', or 'I have seen it fail'? I have never seen problems in probe()
>> with node creation. The only reason I know of why creating a node might fail is
>> being out-of-memory.
> 
> In my case it was top level driver that was triggering device node creation
> in its probe() and if, e.g. some of sub-device's driver was missing, it called,
> through subdev internal unregistered(), op video_unregister_device(), but also
> media_entity_cleanup() which seemed to be the source of trouble.

Device nodes should always be created at the very end after all other setup
actions (like loaded subdev drivers) have been done successfully. From your
description it seems that device nodes were created earlier? 

> So if we ignore the sysfs "unbind" issue the problem might not be that severe.
> Not sure if we should. Nevertheless most of drivers crash kernel currently
> when "unbind" is used on them. E.g. I could unbind unregister the system power
> management device (PMIC) driver, the regulator core warned about regulators 
> in use being removed and then cpufreq killed the system completely due to a
> dangling pointer to a regulator object.
> 
>>>>> and it unwinds, probe callback
>>>>>    exits and the driver code code calls dev_set_drvdata(dev, NULL); as
>>>>>    shown below.
>>>>>
>>>>> static int really_probe(struct device *dev, struct device_driver *drv)
>>>>> {
>>>>> 	...
>>>>> 	pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
>>>>> 		 drv->bus->name, __func__, drv->name, dev_name(dev));
>>>>> 	...
>>>>> 	if (dev->bus->probe) {
>>>>> 		ret = dev->bus->probe(dev);
>>>>> 		if (ret)
>>>>> 			goto probe_failed;
>>>>> 	} else if (drv->probe) {
>>>>> 		ret = drv->probe(dev);
>>>>> 		if (ret)
>>>>> 			goto probe_failed;
>>>>> 	}
>>>>> 	...
>>>>> 	pr_debug("bus: '%s': %s: bound device %s to driver %s\n",
>>>>> 		 drv->bus->name, __func__, dev_name(dev), drv->name);
>>>>> 	goto done;
>>>>>
>>>>> probe_failed:
>>>>> 	devres_release_all(dev);
>>>>> 	driver_sysfs_remove(dev);
>>>>> 	dev->driver = NULL;
>>>>> 	dev_set_drvdata(dev, NULL);
>>>>>
>>>>> 	...
>>>>> 	ret = 0;
>>>>> done:
>>>>> 	...
>>>>> 	return ret;
>>>>> }
>>>>>
>>>>> Now we get to
>>>>>
>>>>>   	ret = vdev->fops->open(filp);
>>>>>
>>>>> in v4l2_open(). This calls some driver's callback, e.g. something
>>>>> like:
>>>>>
>>>>> static int fimc_lite_open(struct file *file)
>>>>> {
>>>>> 	struct fimc_lite *fimc = video_drvdata(file);
>>>>> 	struct media_entity *me =&fimc->ve.vdev.entity;
>>>>> 	int ret;
>>>>>
>>>>> 	mutex_lock(&fimc->lock);
>>>>> 	if (!video_is_registered(&fimc->ve.vdev)) {
>>>>> 		ret = -ENODEV;
>>>>> 		goto unlock;
>>>>> 	}
>>>>>
>>>>> 	...
>>>>>
>>>>> 	/* Mark video pipeline ending at this video node as in use. */
>>>>> 	if (ret == 0)
>>>>> 		me->use_count++;
>>>>> 	...
>>>>> unlock:
>>>>> 	mutex_unlock(&fimc->lock);
>>>>> 	return ret;
>>>>> }
>>>>>
>>>>> Now what will video_drvdata(file); return ?
>>>>>
>>>>> static inline void *video_drvdata(struct file *file)
>>>>> {
>>>>> 	return video_get_drvdata(video_devdata(file));
>>>>> }
>>>>>
>>>>> static inline void *video_get_drvdata(struct video_device *vdev)
>>>>> {
>>>>> 	return dev_get_drvdata(&vdev->dev);
>>>>> }
>>>>>
>>>>> Yes, so that will be just NULL o_O, due to the dev_set_drvdata(dev, NULL);
>>>>> in really_probe(). drvdata is cleared similarly in __device_release_driver(),
>>>>> right after calling driver's remove handler.
>>>>>
>>>>> Another issue I have is that, e.g. driver/media/platform/exynos4-is/fimc-lite*
>>>>> driver has empty video dev release() callback. It should be implemented
>>>>> in the driver to kfree the whole driver's private data structure where
>>>>> struct video_device is embedded in (struct fimc_lite). But that freeing
>>>>> really needs to be synchronized with driver's remove() call, since there
>>>>> is e.g. freed interrupt which accesses the driver's private data. I can't
>>>>> use kref from struct v4l2_device as that belongs to a different driver.
>>>>> A driver's custom reference counting comes to mind, where vdev->release()
>>>>> and drv->remove() would be decrementing the reference counter. But that
>>>>> seems ugly as hell :/ And it predates devm_*.
>>>>>
>>>>> This is all getting a bit depressing :/ Deferred probing and the
>>>>> asynchronous subdev handling just made those issues more visible, i.e.
>>>>> not very good design of some parts of the v4l2-core.
>>>>
>>>> It's just not clear to me how exactly things go wrong for you. Ping me on irc
>>>> tomorrow and we can discuss it further. I have reworked refcounting in the
>>>> past (at the time it was *really* bad), so perhaps we need to rework it again,
>>>> particularly with video nodes associated with subdevices in the mix, something
>>>> that didn't exist at the time.
>>>
>>> Thanks, and sorry for holding on with that for too long.
>>>
>>> The main issue as I see it is that we need to track both driver remove() 
>>> and
>>> struct device .release() calls and free resources only when last of them
>>> executes. Data structures which are referenced in fops must not be freed in
>>> remove() and we cannot use dev_get_drvdata() in fops, e.g. not protected 
>>> with device_lock().
>>
>> You can do all that by returning 0 if probe() was partially successful (i.e.
>> one or more, but not all, nodes were created successfully) by doing what I
>> described above. I don't see another way that doesn't introduce a race condition.
> 
> Yes, that could be a work around for the problem. However it doesn't seem right 
> to assume at the subsystem level. For example errors like EPROBE_DEFER should 
> be propagated transparently by drivers so the driver core retries probing. 
> 
> It is all easy to trigger with the sysfs unbind feature. Regarding fixing 
> it at the driver core, it seems there have been brave that tried it already 
> [1]. :) It's surprising how poorly this feature is handled by many drivers or 
> even subsystems.

I really don't think the unbind issue can be solved in drivers without improved
core support.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open()
  2013-09-11 14:01             ` Hans Verkuil
@ 2013-09-12 10:19               ` Sylwester Nawrocki
  0 siblings, 0 replies; 18+ messages in thread
From: Sylwester Nawrocki @ 2013-09-12 10:19 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Sylwester Nawrocki, linux-media, laurent.pinchart, Kyungmin Park,
	Mark Brown, Andrzej Hajda

Hi Hans,

On 09/11/2013 04:01 PM, Hans Verkuil wrote:
> On 09/11/2013 03:07 PM, Sylwester Nawrocki wrote:
>> On 09/09/2013 11:07 AM, Hans Verkuil wrote:
>>> On 09/06/2013 12:33 AM, Sylwester Nawrocki wrote:
>>>> On 08/07/2013 07:49 PM, Hans Verkuil wrote:
>>>>> On 08/07/2013 06:49 PM, Sylwester Nawrocki wrote:
>>>>>> On 08/02/2013 03:00 PM, Hans Verkuil wrote:
>>>>>>> On 08/02/2013 02:27 PM, Sylwester Nawrocki wrote:
[...]
>>>> So there is lots of things that may fail after first video node is 
>>>> registered, and on which open() may be called immediately.
>>>
>>> The only solution I know off-hand to handle this safely is to unregister all
>>> nodes if some fail, but to return 0 in the probe function. If an open() succeeded,
>>> then that will just work until the node is closed, at which point the v4l2_device
>>> release() is called and you can cleanup.
>>
>> Another solution would be to properly implement struct video_device::release()
>> callback and in video device open() do video_is_registered() check while
>> holding the driver private video device lock. Also video_unregister_device()
>> would need to be called while holding the video device lock.
> 
> How would that help?

By not allowing video_unregister_device() call in the middle of the file op and
serializing whatever does video device unregistration with the file ops. I'm not
sure it is possible in all cases.

For ioctls it's already party taken care of by video_is_registered() check while 
holding the video dev mutex. For other file ops drivers currently need to do 
various checks to ensure that all driver private data/resources that may be needed 
in the file operations callbacks are in place.

[...]
>>> Is this 'could fail', or 'I have seen it fail'? I have never seen problems in probe()
>>> with node creation. The only reason I know of why creating a node might fail is
>>> being out-of-memory.
>>
>> In my case it was top level driver that was triggering device node creation
>> in its probe() and if, e.g. some of sub-device's driver was missing, it called,
>> through subdev internal unregistered(), op video_unregister_device(), but also
>> media_entity_cleanup() which seemed to be the source of trouble.
> 
> Device nodes should always be created at the very end after all other setup
> actions (like loaded subdev drivers) have been done successfully. From your
> description it seems that device nodes were created earlier? 

Yes, let me explain in what configuration it happens (perhaps looking at 
fimc_md_probe() function at drives/media/platform/exynos4-is/media-dev.c makes
it easier to understand; the version closer to what we work with currently
is available at [1]).

There are multiple platform devices that in their driver's probe() just 
initialize the driver private data and platform device resources, like irq, 
IO register, etc.

There is also a top level driver (struct v4l2_device) that walks through 
Device Tree and registers those platform devices as v4l2 subdevs. While 
doing that video nodes are created in some subdevs' .registered() callbacks.

After that image sensors are being registered.

Some platform devices need to be registered before image sensors, due to 
resource dependencies. E.g. some device need to be activated so clock for 
an image sensor is available at an SoC output pin.

So this is a bit more complex issue than with a single device, a single 
driver and its probe() interaction with the file ops.

[1] git.linuxtv.org/snawrocki/samsung.git/v3.11-rc2-dts-exynos4-is-clk

--
Regards,
Sylwester

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-09-12 10:19 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-02 12:27 [PATCH] V4L: Drop meaningless video_is_registered() call in v4l2_open() Sylwester Nawrocki
2013-08-02 13:00 ` Hans Verkuil
2013-08-07 16:49   ` Sylwester Nawrocki
2013-08-07 17:49     ` Hans Verkuil
2013-08-07 22:16       ` Laurent Pinchart
2013-08-08 12:36       ` Andrzej Hajda
2013-09-05 22:33       ` Sylwester Nawrocki
2013-09-09  9:07         ` Hans Verkuil
2013-09-09 10:00           ` Laurent Pinchart
2013-09-09 10:07             ` Hans Verkuil
2013-09-09 10:10               ` Laurent Pinchart
2013-09-09 10:17                 ` Hans Verkuil
2013-09-09 10:24                   ` Laurent Pinchart
2013-09-09 10:37                     ` Hans Verkuil
2013-09-09 10:48                       ` Hans Verkuil
2013-09-11 13:07           ` Sylwester Nawrocki
2013-09-11 14:01             ` Hans Verkuil
2013-09-12 10:19               ` Sylwester Nawrocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.