All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v3 00/21] Make use of kref in media device, grab references as needed
@ 2016-08-26 23:43 Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 01/21] Revert "[media] media: fix media devnode ioctl/syscall and unregister race" Sakari Ailus
                   ` (21 more replies)
  0 siblings, 22 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

Hi folks,

This is the third version of the RFC set to fix referencing in media
devices.

The lifetime of the media device (and media devnode) is now bound to that
of struct device embedded in it and its memory is only released once the
last reference is gone: unregistering is simply unregistering, it no
longer should release memory which could be further accessed.
                                                                                
A video node or a sub-device node also gets a reference to the media
device, i.e. the release function of the video device node will release
its reference to the media device. The same goes for file handles to the
media device.
                                                                                
As a side effect of this is that the media device, it is allocate together
with the media devnode. The driver may also rely its own resources to the
media device. Alternatively there's also a priv field to hold drivers
private pointer (for container_of() is an option in this case). We could
drop one of these options but currently both are possible.
                                                                                
I've tested this by manually unbinding the omap3isp platform device while
streaming. Driver changes are required for this to work; by not using
dynamic allocation (i.e. media_device_alloc()) the old behaviour is still
supported. This is still unlikely to be a grave problem as there are not
that many device drivers that support physically removable devices. We've
had this problem for other devices for many years without paying much
notice --- that doesn't mean I don't think at least drivers for removable
devices shouldn't be changed as part of the set later on, I'd just like to
get review comments on the approach first.
                                                                                
The three patches that originally partially resolved some of these issues
are reverted in the beginning of the set. I'm still posting this as an RFC
mainly since the testing is somewhat limited so far.

changes since v2:

- Rework the set in order to make the changes more consistent, easier to
  understand and better ordered.

- Properly change referencing media_dev->dev (patch "media device: Get the
  media device driver's device" added).

- Only set the release() callback to media device if the new
  media_device_alloc() API is used. (The callback just printed a debug
  message before this series.)

- Call cdev_del() before removing the device (patch 7).

- Document media_device_init() and media_device_cleanup() as deprecated.

- Spelling fixes.

The to-do list includes changes to drivers that can be physically removed.
Drivers not using the new API can mostly ignore these changes, albeit
media_device_init() now grabs a reference to struct device of the media
device which must be released.

-- 
Kind regards,
Sakari

^ permalink raw reply	[flat|nested] 89+ messages in thread

* [RFC v3 01/21] Revert "[media] media: fix media devnode ioctl/syscall and unregister race"
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 02/21] Revert "[media] media: fix use-after-free in cdev_put() when app exits after driver unbind" Sakari Ailus
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

This reverts commit 6f0dd24a084a ("[media] media: fix media devnode
ioctl/syscall and unregister race"). The commit was part of an original
patchset to avoid crashes when an unregistering device is in use.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-device.c  | 15 +++++++--------
 drivers/media/media-devnode.c |  8 +-------
 include/media/media-devnode.h | 16 ++--------------
 3 files changed, 10 insertions(+), 29 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 1795abe..33a9952 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -732,7 +732,6 @@ int __must_check __media_device_register(struct media_device *mdev,
 	if (ret < 0) {
 		/* devnode free is handled in media_devnode_*() */
 		mdev->devnode = NULL;
-		media_devnode_unregister_prepare(devnode);
 		media_devnode_unregister(devnode);
 		return ret;
 	}
@@ -789,9 +788,6 @@ void media_device_unregister(struct media_device *mdev)
 		return;
 	}
 
-	/* Clear the devnode register bit to avoid races with media dev open */
-	media_devnode_unregister_prepare(mdev->devnode);
-
 	/* Remove all entities from the media device */
 	list_for_each_entry_safe(entity, next, &mdev->entities, graph_obj.list)
 		__media_device_unregister_entity(entity);
@@ -812,10 +808,13 @@ void media_device_unregister(struct media_device *mdev)
 
 	dev_dbg(mdev->dev, "Media device unregistered\n");
 
-	device_remove_file(&mdev->devnode->dev, &dev_attr_model);
-	media_devnode_unregister(mdev->devnode);
-	/* devnode free is handled in media_devnode_*() */
-	mdev->devnode = NULL;
+	/* Check if mdev devnode was registered */
+	if (media_devnode_is_registered(mdev->devnode)) {
+		device_remove_file(&mdev->devnode->dev, &dev_attr_model);
+		media_devnode_unregister(mdev->devnode);
+		/* devnode free is handled in media_devnode_*() */
+		mdev->devnode = NULL;
+	}
 }
 EXPORT_SYMBOL_GPL(media_device_unregister);
 
diff --git a/drivers/media/media-devnode.c b/drivers/media/media-devnode.c
index f2772ba..5b605ff 100644
--- a/drivers/media/media-devnode.c
+++ b/drivers/media/media-devnode.c
@@ -287,7 +287,7 @@ cdev_add_error:
 	return ret;
 }
 
-void media_devnode_unregister_prepare(struct media_devnode *devnode)
+void media_devnode_unregister(struct media_devnode *devnode)
 {
 	/* Check if devnode was ever registered at all */
 	if (!media_devnode_is_registered(devnode))
@@ -295,12 +295,6 @@ void media_devnode_unregister_prepare(struct media_devnode *devnode)
 
 	mutex_lock(&media_devnode_lock);
 	clear_bit(MEDIA_FLAG_REGISTERED, &devnode->flags);
-	mutex_unlock(&media_devnode_lock);
-}
-
-void media_devnode_unregister(struct media_devnode *devnode)
-{
-	mutex_lock(&media_devnode_lock);
 	/* Delete the cdev on this minor as well */
 	cdev_del(&devnode->cdev);
 	mutex_unlock(&media_devnode_lock);
diff --git a/include/media/media-devnode.h b/include/media/media-devnode.h
index 37d4948..d5037a9 100644
--- a/include/media/media-devnode.h
+++ b/include/media/media-devnode.h
@@ -127,26 +127,14 @@ int __must_check media_devnode_register(struct media_device *mdev,
 					struct module *owner);
 
 /**
- * media_devnode_unregister_prepare - clear the media device node register bit
- * @devnode: the device node to prepare for unregister
- *
- * This clears the passed device register bit. Future open calls will be met
- * with errors. Should be called before media_devnode_unregister() to avoid
- * races with unregister and device file open calls.
- *
- * This function can safely be called if the device node has never been
- * registered or has already been unregistered.
- */
-void media_devnode_unregister_prepare(struct media_devnode *devnode);
-
-/**
  * media_devnode_unregister - unregister a media device node
  * @devnode: the device node to unregister
  *
  * This unregisters the passed device. Future open calls will be met with
  * errors.
  *
- * Should be called after media_devnode_unregister_prepare()
+ * This function can safely be called if the device node has never been
+ * registered or has already been unregistered.
  */
 void media_devnode_unregister(struct media_devnode *devnode);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 02/21] Revert "[media] media: fix use-after-free in cdev_put() when app exits after driver unbind"
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 01/21] Revert "[media] media: fix media devnode ioctl/syscall and unregister race" Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 03/21] Revert "[media] media-device: dynamically allocate struct media_devnode" Sakari Ailus
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

This reverts commit 5b28dde51d0c ("[media] media: fix use-after-free in
cdev_put() when app exits after driver unbind"). The commit was part of an
original patchset to avoid crashes when an unregistering device is in use.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-device.c  |  6 ++----
 drivers/media/media-devnode.c | 48 +++++++++++++++++--------------------------
 2 files changed, 21 insertions(+), 33 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 33a9952..e61fa66 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -723,16 +723,16 @@ int __must_check __media_device_register(struct media_device *mdev,
 
 	ret = media_devnode_register(mdev, devnode, owner);
 	if (ret < 0) {
-		/* devnode free is handled in media_devnode_*() */
 		mdev->devnode = NULL;
+		kfree(devnode);
 		return ret;
 	}
 
 	ret = device_create_file(&devnode->dev, &dev_attr_model);
 	if (ret < 0) {
-		/* devnode free is handled in media_devnode_*() */
 		mdev->devnode = NULL;
 		media_devnode_unregister(devnode);
+		kfree(devnode);
 		return ret;
 	}
 
@@ -812,8 +812,6 @@ void media_device_unregister(struct media_device *mdev)
 	if (media_devnode_is_registered(mdev->devnode)) {
 		device_remove_file(&mdev->devnode->dev, &dev_attr_model);
 		media_devnode_unregister(mdev->devnode);
-		/* devnode free is handled in media_devnode_*() */
-		mdev->devnode = NULL;
 	}
 }
 EXPORT_SYMBOL_GPL(media_device_unregister);
diff --git a/drivers/media/media-devnode.c b/drivers/media/media-devnode.c
index 5b605ff..ecdc02d 100644
--- a/drivers/media/media-devnode.c
+++ b/drivers/media/media-devnode.c
@@ -63,8 +63,13 @@ static void media_devnode_release(struct device *cd)
 	struct media_devnode *devnode = to_media_devnode(cd);
 
 	mutex_lock(&media_devnode_lock);
+
+	/* Delete the cdev on this minor as well */
+	cdev_del(&devnode->cdev);
+
 	/* Mark device node number as free */
 	clear_bit(devnode->minor, media_devnode_nums);
+
 	mutex_unlock(&media_devnode_lock);
 
 	/* Release media_devnode and perform other cleanups as needed. */
@@ -72,7 +77,6 @@ static void media_devnode_release(struct device *cd)
 		devnode->release(devnode);
 
 	kfree(devnode);
-	pr_debug("%s: Media Devnode Deallocated\n", __func__);
 }
 
 static struct bus_type media_bus_type = {
@@ -201,8 +205,6 @@ static int media_release(struct inode *inode, struct file *filp)
 	/* decrease the refcount unconditionally since the release()
 	   return value is ignored. */
 	put_device(&devnode->dev);
-
-	pr_debug("%s: Media Release\n", __func__);
 	return 0;
 }
 
@@ -233,7 +235,6 @@ int __must_check media_devnode_register(struct media_device *mdev,
 	if (minor == MEDIA_NUM_DEVICES) {
 		mutex_unlock(&media_devnode_lock);
 		pr_err("could not get a free minor\n");
-		kfree(devnode);
 		return -ENFILE;
 	}
 
@@ -243,31 +244,27 @@ int __must_check media_devnode_register(struct media_device *mdev,
 	devnode->minor = minor;
 	devnode->media_dev = mdev;
 
-	/* Part 1: Initialize dev now to use dev.kobj for cdev.kobj.parent */
-	devnode->dev.bus = &media_bus_type;
-	devnode->dev.devt = MKDEV(MAJOR(media_dev_t), devnode->minor);
-	devnode->dev.release = media_devnode_release;
-	if (devnode->parent)
-		devnode->dev.parent = devnode->parent;
-	dev_set_name(&devnode->dev, "media%d", devnode->minor);
-	device_initialize(&devnode->dev);
-
 	/* Part 2: Initialize and register the character device */
 	cdev_init(&devnode->cdev, &media_devnode_fops);
 	devnode->cdev.owner = owner;
-	devnode->cdev.kobj.parent = &devnode->dev.kobj;
 
 	ret = cdev_add(&devnode->cdev, MKDEV(MAJOR(media_dev_t), devnode->minor), 1);
 	if (ret < 0) {
 		pr_err("%s: cdev_add failed\n", __func__);
-		goto cdev_add_error;
+		goto error;
 	}
 
-	/* Part 3: Add the media device */
-	ret = device_add(&devnode->dev);
+	/* Part 3: Register the media device */
+	devnode->dev.bus = &media_bus_type;
+	devnode->dev.devt = MKDEV(MAJOR(media_dev_t), devnode->minor);
+	devnode->dev.release = media_devnode_release;
+	if (devnode->parent)
+		devnode->dev.parent = devnode->parent;
+	dev_set_name(&devnode->dev, "media%d", devnode->minor);
+	ret = device_register(&devnode->dev);
 	if (ret < 0) {
-		pr_err("%s: device_add failed\n", __func__);
-		goto device_add_error;
+		pr_err("%s: device_register failed\n", __func__);
+		goto error;
 	}
 
 	/* Part 4: Activate this minor. The char device can now be used. */
@@ -275,15 +272,12 @@ int __must_check media_devnode_register(struct media_device *mdev,
 
 	return 0;
 
-device_add_error:
-	cdev_del(&devnode->cdev);
-cdev_add_error:
+error:
 	mutex_lock(&media_devnode_lock);
+	cdev_del(&devnode->cdev);
 	clear_bit(devnode->minor, media_devnode_nums);
-	devnode->media_dev = NULL;
 	mutex_unlock(&media_devnode_lock);
 
-	put_device(&devnode->dev);
 	return ret;
 }
 
@@ -295,12 +289,8 @@ void media_devnode_unregister(struct media_devnode *devnode)
 
 	mutex_lock(&media_devnode_lock);
 	clear_bit(MEDIA_FLAG_REGISTERED, &devnode->flags);
-	/* Delete the cdev on this minor as well */
-	cdev_del(&devnode->cdev);
 	mutex_unlock(&media_devnode_lock);
-	device_del(&devnode->dev);
-	devnode->media_dev = NULL;
-	put_device(&devnode->dev);
+	device_unregister(&devnode->dev);
 }
 
 /*
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 03/21] Revert "[media] media-device: dynamically allocate struct media_devnode"
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 01/21] Revert "[media] media: fix media devnode ioctl/syscall and unregister race" Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 02/21] Revert "[media] media: fix use-after-free in cdev_put() when app exits after driver unbind" Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 04/21] media: Remove useless curly braces and parentheses Sakari Ailus
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

This reverts commit a087ce704b80 ("[media] media-device: dynamically
allocate struct media_devnode"). The commit was part of an original
patchset to avoid crashes when an unregistering device is in use.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-device.c           | 44 +++++++++++-----------------------
 drivers/media/media-devnode.c          |  7 +-----
 drivers/media/usb/au0828/au0828-core.c |  4 ++--
 drivers/media/usb/uvc/uvc_driver.c     |  2 +-
 include/media/media-device.h           |  5 +++-
 include/media/media-devnode.h          | 15 ++----------
 6 files changed, 24 insertions(+), 53 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index e61fa66..a1cd50f 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -423,7 +423,7 @@ static long media_device_ioctl(struct file *filp, unsigned int cmd,
 			       unsigned long arg)
 {
 	struct media_devnode *devnode = media_devnode_data(filp);
-	struct media_device *dev = devnode->media_dev;
+	struct media_device *dev = to_media_device(devnode);
 	long ret;
 
 	mutex_lock(&dev->graph_mutex);
@@ -495,7 +495,7 @@ static long media_device_compat_ioctl(struct file *filp, unsigned int cmd,
 				      unsigned long arg)
 {
 	struct media_devnode *devnode = media_devnode_data(filp);
-	struct media_device *dev = devnode->media_dev;
+	struct media_device *dev = to_media_device(devnode);
 	long ret;
 
 	switch (cmd) {
@@ -531,8 +531,7 @@ static const struct media_file_operations media_device_fops = {
 static ssize_t show_model(struct device *cd,
 			  struct device_attribute *attr, char *buf)
 {
-	struct media_devnode *devnode = to_media_devnode(cd);
-	struct media_device *mdev = devnode->media_dev;
+	struct media_device *mdev = to_media_device(to_media_devnode(cd));
 
 	return sprintf(buf, "%.*s\n", (int)sizeof(mdev->model), mdev->model);
 }
@@ -705,34 +704,23 @@ EXPORT_SYMBOL_GPL(media_device_cleanup);
 int __must_check __media_device_register(struct media_device *mdev,
 					 struct module *owner)
 {
-	struct media_devnode *devnode;
 	int ret;
 
-	devnode = kzalloc(sizeof(*devnode), GFP_KERNEL);
-	if (!devnode)
-		return -ENOMEM;
-
 	/* Register the device node. */
-	mdev->devnode = devnode;
-	devnode->fops = &media_device_fops;
-	devnode->parent = mdev->dev;
-	devnode->release = media_device_release;
+	mdev->devnode.fops = &media_device_fops;
+	mdev->devnode.parent = mdev->dev;
+	mdev->devnode.release = media_device_release;
 
 	/* Set version 0 to indicate user-space that the graph is static */
 	mdev->topology_version = 0;
 
-	ret = media_devnode_register(mdev, devnode, owner);
-	if (ret < 0) {
-		mdev->devnode = NULL;
-		kfree(devnode);
+	ret = media_devnode_register(&mdev->devnode, owner);
+	if (ret < 0)
 		return ret;
-	}
 
-	ret = device_create_file(&devnode->dev, &dev_attr_model);
+	ret = device_create_file(&mdev->devnode.dev, &dev_attr_model);
 	if (ret < 0) {
-		mdev->devnode = NULL;
-		media_devnode_unregister(devnode);
-		kfree(devnode);
+		media_devnode_unregister(&mdev->devnode);
 		return ret;
 	}
 
@@ -783,7 +771,7 @@ void media_device_unregister(struct media_device *mdev)
 	mutex_lock(&mdev->graph_mutex);
 
 	/* Check if mdev was ever registered at all */
-	if (!media_devnode_is_registered(mdev->devnode)) {
+	if (!media_devnode_is_registered(&mdev->devnode)) {
 		mutex_unlock(&mdev->graph_mutex);
 		return;
 	}
@@ -806,13 +794,9 @@ void media_device_unregister(struct media_device *mdev)
 
 	mutex_unlock(&mdev->graph_mutex);
 
-	dev_dbg(mdev->dev, "Media device unregistered\n");
-
-	/* Check if mdev devnode was registered */
-	if (media_devnode_is_registered(mdev->devnode)) {
-		device_remove_file(&mdev->devnode->dev, &dev_attr_model);
-		media_devnode_unregister(mdev->devnode);
-	}
+	device_remove_file(&mdev->devnode.dev, &dev_attr_model);
+	dev_dbg(mdev->dev, "Media device unregistering\n");
+	media_devnode_unregister(&mdev->devnode);
 }
 EXPORT_SYMBOL_GPL(media_device_unregister);
 
diff --git a/drivers/media/media-devnode.c b/drivers/media/media-devnode.c
index ecdc02d..7481c96 100644
--- a/drivers/media/media-devnode.c
+++ b/drivers/media/media-devnode.c
@@ -44,7 +44,6 @@
 #include <linux/uaccess.h>
 
 #include <media/media-devnode.h>
-#include <media/media-device.h>
 
 #define MEDIA_NUM_DEVICES	256
 #define MEDIA_NAME		"media"
@@ -75,8 +74,6 @@ static void media_devnode_release(struct device *cd)
 	/* Release media_devnode and perform other cleanups as needed. */
 	if (devnode->release)
 		devnode->release(devnode);
-
-	kfree(devnode);
 }
 
 static struct bus_type media_bus_type = {
@@ -222,8 +219,7 @@ static const struct file_operations media_devnode_fops = {
 	.llseek = no_llseek,
 };
 
-int __must_check media_devnode_register(struct media_device *mdev,
-					struct media_devnode *devnode,
+int __must_check media_devnode_register(struct media_devnode *devnode,
 					struct module *owner)
 {
 	int minor;
@@ -242,7 +238,6 @@ int __must_check media_devnode_register(struct media_device *mdev,
 	mutex_unlock(&media_devnode_lock);
 
 	devnode->minor = minor;
-	devnode->media_dev = mdev;
 
 	/* Part 2: Initialize and register the character device */
 	cdev_init(&devnode->cdev, &media_devnode_fops);
diff --git a/drivers/media/usb/au0828/au0828-core.c b/drivers/media/usb/au0828/au0828-core.c
index bf53553..321ea5c 100644
--- a/drivers/media/usb/au0828/au0828-core.c
+++ b/drivers/media/usb/au0828/au0828-core.c
@@ -142,7 +142,7 @@ static void au0828_unregister_media_device(struct au0828_dev *dev)
 	struct media_device *mdev = dev->media_dev;
 	struct media_entity_notify *notify, *nextp;
 
-	if (!mdev || !media_devnode_is_registered(mdev->devnode))
+	if (!mdev || !media_devnode_is_registered(&mdev->devnode))
 		return;
 
 	/* Remove au0828 entity_notify callbacks */
@@ -482,7 +482,7 @@ static int au0828_media_device_register(struct au0828_dev *dev,
 	if (!dev->media_dev)
 		return 0;
 
-	if (!media_devnode_is_registered(dev->media_dev->devnode)) {
+	if (!media_devnode_is_registered(&dev->media_dev->devnode)) {
 
 		/* register media device */
 		ret = media_device_register(dev->media_dev);
diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c
index 302e284..451e84e9 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1674,7 +1674,7 @@ static void uvc_delete(struct uvc_device *dev)
 	if (dev->vdev.dev)
 		v4l2_device_unregister(&dev->vdev);
 #ifdef CONFIG_MEDIA_CONTROLLER
-	if (media_devnode_is_registered(dev->mdev.devnode))
+	if (media_devnode_is_registered(&dev->mdev.devnode))
 		media_device_unregister(&dev->mdev);
 	media_device_cleanup(&dev->mdev);
 #endif
diff --git a/include/media/media-device.h b/include/media/media-device.h
index 2819524..4eee613 100644
--- a/include/media/media-device.h
+++ b/include/media/media-device.h
@@ -116,7 +116,7 @@ struct media_entity_notify {
 struct media_device {
 	/* dev->driver_data points to this struct. */
 	struct device *dev;
-	struct media_devnode *devnode;
+	struct media_devnode devnode;
 
 	char model[32];
 	char driver_name[32];
@@ -162,6 +162,9 @@ struct usb_device;
 #define MEDIA_DEV_NOTIFY_PRE_LINK_CH	0
 #define MEDIA_DEV_NOTIFY_POST_LINK_CH	1
 
+/* media_devnode to media_device */
+#define to_media_device(node) container_of(node, struct media_device, devnode)
+
 /**
  * media_entity_enum_init - Initialise an entity enumeration
  *
diff --git a/include/media/media-devnode.h b/include/media/media-devnode.h
index d5037a9..a0f6823 100644
--- a/include/media/media-devnode.h
+++ b/include/media/media-devnode.h
@@ -33,8 +33,6 @@
 #include <linux/device.h>
 #include <linux/cdev.h>
 
-struct media_device;
-
 /*
  * Flag to mark the media_devnode struct as registered. Drivers must not touch
  * this flag directly, it will be set and cleared by media_devnode_register and
@@ -84,8 +82,6 @@ struct media_file_operations {
  * before registering the node.
  */
 struct media_devnode {
-	struct media_device *media_dev;
-
 	/* device ops */
 	const struct media_file_operations *fops;
 
@@ -108,8 +104,7 @@ struct media_devnode {
 /**
  * media_devnode_register - register a media device node
  *
- * @mdev: struct media_device we want to register a device node
- * @devnode: media device node structure we want to register
+ * @devnode: struct media_devnode we want to register a device node
  * @owner: should be filled with %THIS_MODULE
  *
  * The registration code assigns minor numbers and registers the new device node
@@ -122,8 +117,7 @@ struct media_devnode {
  * the media_devnode structure is *not* called, so the caller is responsible for
  * freeing any data.
  */
-int __must_check media_devnode_register(struct media_device *mdev,
-					struct media_devnode *devnode,
+int __must_check media_devnode_register(struct media_devnode *devnode,
 					struct module *owner);
 
 /**
@@ -153,14 +147,9 @@ static inline struct media_devnode *media_devnode_data(struct file *filp)
  *	false otherwise.
  *
  * @devnode: pointer to struct &media_devnode.
- *
- * Note: If mdev is NULL, it also returns false.
  */
 static inline int media_devnode_is_registered(struct media_devnode *devnode)
 {
-	if (!devnode)
-		return false;
-
 	return test_bit(MEDIA_FLAG_REGISTERED, &devnode->flags);
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 04/21] media: Remove useless curly braces and parentheses
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (2 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 03/21] Revert "[media] media-device: dynamically allocate struct media_devnode" Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 05/21] media: devnode: Rename mdev argument as devnode Sakari Ailus
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
---
 drivers/media/media-device.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index a1cd50f..8bdc316 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -596,9 +596,8 @@ int __must_check media_device_register_entity(struct media_device *mdev,
 			       &entity->pads[i].graph_obj);
 
 	/* invoke entity_notify callbacks */
-	list_for_each_entry_safe(notify, next, &mdev->entity_notify, list) {
-		(notify)->notify(entity, notify->notify_data);
-	}
+	list_for_each_entry_safe(notify, next, &mdev->entity_notify, list)
+		notify->notify(entity, notify->notify_data);
 
 	if (mdev->entity_internal_idx_max
 	    >= mdev->pm_count_walk.ent_enum.idx_max) {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 05/21] media: devnode: Rename mdev argument as devnode
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (3 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 04/21] media: Remove useless curly braces and parentheses Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 06/21] media device: Drop nop release callback Sakari Ailus
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

Historically, mdev argument name was being used on both struct
media_device and struct media_devnode. Recently most occurrences of mdev
referring to struct media_devnode were replaced by devnode, which makes
more sense. Fix the last remaining occurrence.

Fixes: 163f1e93e9950 ("[media] media-devnode: fix namespace mess")
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
---
 drivers/media/media-device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 8bdc316..a431775 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -542,9 +542,9 @@ static DEVICE_ATTR(model, S_IRUGO, show_model, NULL);
  * Registration/unregistration
  */
 
-static void media_device_release(struct media_devnode *mdev)
+static void media_device_release(struct media_devnode *devnode)
 {
-	dev_dbg(mdev->parent, "Media device released\n");
+	dev_dbg(devnode->parent, "Media device released\n");
 }
 
 /**
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 06/21] media device: Drop nop release callback
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (4 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 05/21] media: devnode: Rename mdev argument as devnode Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 07/21] media-device: Make devnode.dev->kobj parent of devnode.cdev Sakari Ailus
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

The release callback is only used to print a debug message. Drop it. (It
will be re-introduced later in a different form.)

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-device.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index a431775..d90d8c6 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -542,11 +542,6 @@ static DEVICE_ATTR(model, S_IRUGO, show_model, NULL);
  * Registration/unregistration
  */
 
-static void media_device_release(struct media_devnode *devnode)
-{
-	dev_dbg(devnode->parent, "Media device released\n");
-}
-
 /**
  * media_device_register_entity - Register an entity with a media device
  * @mdev:	The media device
@@ -708,7 +703,6 @@ int __must_check __media_device_register(struct media_device *mdev,
 	/* Register the device node. */
 	mdev->devnode.fops = &media_device_fops;
 	mdev->devnode.parent = mdev->dev;
-	mdev->devnode.release = media_device_release;
 
 	/* Set version 0 to indicate user-space that the graph is static */
 	mdev->topology_version = 0;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 07/21] media-device: Make devnode.dev->kobj parent of devnode.cdev
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (5 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 06/21] media device: Drop nop release callback Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 08/21] media: Enable allocating the media device dynamically Sakari Ailus
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

The struct cdev embedded in struct media_devnode contains its own kobj.
Instead of trying to manage its lifetime separately from struct
media_devnode, make the cdev kobj a parent of the struct media_device.dev
kobj.

The cdev will thus be released during unregistering the media_devnode, not
in media_devnode.dev kobj's release callback.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-devnode.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/media/media-devnode.c b/drivers/media/media-devnode.c
index 7481c96..a8302fc 100644
--- a/drivers/media/media-devnode.c
+++ b/drivers/media/media-devnode.c
@@ -63,9 +63,6 @@ static void media_devnode_release(struct device *cd)
 
 	mutex_lock(&media_devnode_lock);
 
-	/* Delete the cdev on this minor as well */
-	cdev_del(&devnode->cdev);
-
 	/* Mark device node number as free */
 	clear_bit(devnode->minor, media_devnode_nums);
 
@@ -241,6 +238,7 @@ int __must_check media_devnode_register(struct media_devnode *devnode,
 
 	/* Part 2: Initialize and register the character device */
 	cdev_init(&devnode->cdev, &media_devnode_fops);
+	devnode->cdev.kobj.parent = &devnode->dev.kobj;
 	devnode->cdev.owner = owner;
 
 	ret = cdev_add(&devnode->cdev, MKDEV(MAJOR(media_dev_t), devnode->minor), 1);
@@ -285,6 +283,7 @@ void media_devnode_unregister(struct media_devnode *devnode)
 	mutex_lock(&media_devnode_lock);
 	clear_bit(MEDIA_FLAG_REGISTERED, &devnode->flags);
 	mutex_unlock(&media_devnode_lock);
+	cdev_del(&devnode->cdev);
 	device_unregister(&devnode->dev);
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 08/21] media: Enable allocating the media device dynamically
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (6 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 07/21] media-device: Make devnode.dev->kobj parent of devnode.cdev Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 09/21] media: Split initialising and adding media devnode Sakari Ailus
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart, Sakari Ailus

From: Sakari Ailus <sakari.ailus@iki.fi>

Allow allocating the media device dynamically. As the struct media_device
embeds struct media_devnode, the lifetime of that object is that same than
that of the media_device.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-device.c | 15 +++++++++++++++
 include/media/media-device.h | 13 +++++++++++++
 2 files changed, 28 insertions(+)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index d90d8c6..6eca50c 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -686,6 +686,21 @@ void media_device_init(struct media_device *mdev)
 }
 EXPORT_SYMBOL_GPL(media_device_init);
 
+struct media_device *media_device_alloc(struct device *dev)
+{
+	struct media_device *mdev;
+
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev)
+		return NULL;
+
+	mdev->dev = dev;
+	media_device_init(mdev);
+
+	return mdev;
+}
+EXPORT_SYMBOL_GPL(media_device_alloc);
+
 void media_device_cleanup(struct media_device *mdev)
 {
 	ida_destroy(&mdev->entity_internal_idx);
diff --git a/include/media/media-device.h b/include/media/media-device.h
index 4eee613..1fdfbd7 100644
--- a/include/media/media-device.h
+++ b/include/media/media-device.h
@@ -197,6 +197,15 @@ static inline __must_check int media_entity_enum_init(
 void media_device_init(struct media_device *mdev);
 
 /**
+ * media_device_alloc() - Allocate and initialise a media device
+ *
+ * @dev:	The associated struct device pointer
+ *
+ * Allocate and initialise a media device. Returns a media device.
+ */
+struct media_device *media_device_alloc(struct device *dev);
+
+/**
  * media_device_cleanup() - Cleanups a media device element
  *
  * @mdev:	pointer to struct &media_device
@@ -425,6 +434,10 @@ void __media_device_usb_init(struct media_device *mdev,
 			     const char *driver_name);
 
 #else
+static inline struct media_device *media_device_alloc(struct device *dev)
+{
+	return NULL;
+}
 static inline int media_device_register(struct media_device *mdev)
 {
 	return 0;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 09/21] media: Split initialising and adding media devnode
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (7 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 08/21] media: Enable allocating the media device dynamically Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 10/21] media: Shuffle functions around Sakari Ailus
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

As registering a device node of an entity belonging to a media device
will require a reference to the struct device. Taking that reference is
only possible once the device has been initialised, which took place only
when it was registered. Split this in two, and initialise the device when
the media device is allocated.

Don't distribute the effects of these changes yet. Add media_device_get()
and media_device_put() first.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-device.c  | 18 +++++++++++++-----
 drivers/media/media-devnode.c | 17 +++++++++++------
 include/media/media-devnode.h | 19 ++++++++++++++-----
 3 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 6eca50c..9765031 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -722,19 +722,26 @@ int __must_check __media_device_register(struct media_device *mdev,
 	/* Set version 0 to indicate user-space that the graph is static */
 	mdev->topology_version = 0;
 
+	media_devnode_init(&mdev->devnode);
+
 	ret = media_devnode_register(&mdev->devnode, owner);
 	if (ret < 0)
-		return ret;
+		goto out_put;
 
 	ret = device_create_file(&mdev->devnode.dev, &dev_attr_model);
-	if (ret < 0) {
-		media_devnode_unregister(&mdev->devnode);
-		return ret;
-	}
+	if (ret < 0)
+		goto out_unregister;
 
 	dev_dbg(mdev->dev, "Media device registered\n");
 
 	return 0;
+
+out_unregister:
+	media_devnode_unregister(&mdev->devnode);
+out_put:
+	put_device(&mdev->devnode.dev);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(__media_device_register);
 
@@ -805,6 +812,7 @@ void media_device_unregister(struct media_device *mdev)
 	device_remove_file(&mdev->devnode.dev, &dev_attr_model);
 	dev_dbg(mdev->dev, "Media device unregistering\n");
 	media_devnode_unregister(&mdev->devnode);
+	put_device(&mdev->devnode.dev);
 }
 EXPORT_SYMBOL_GPL(media_device_unregister);
 
diff --git a/drivers/media/media-devnode.c b/drivers/media/media-devnode.c
index a8302fc..178d692 100644
--- a/drivers/media/media-devnode.c
+++ b/drivers/media/media-devnode.c
@@ -216,6 +216,11 @@ static const struct file_operations media_devnode_fops = {
 	.llseek = no_llseek,
 };
 
+void media_devnode_init(struct media_devnode *devnode)
+{
+	device_initialize(&devnode->dev);
+}
+
 int __must_check media_devnode_register(struct media_devnode *devnode,
 					struct module *owner)
 {
@@ -254,7 +259,7 @@ int __must_check media_devnode_register(struct media_devnode *devnode,
 	if (devnode->parent)
 		devnode->dev.parent = devnode->parent;
 	dev_set_name(&devnode->dev, "media%d", devnode->minor);
-	ret = device_register(&devnode->dev);
+	ret = device_add(&devnode->dev);
 	if (ret < 0) {
 		pr_err("%s: device_register failed\n", __func__);
 		goto error;
@@ -284,13 +289,13 @@ void media_devnode_unregister(struct media_devnode *devnode)
 	clear_bit(MEDIA_FLAG_REGISTERED, &devnode->flags);
 	mutex_unlock(&media_devnode_lock);
 	cdev_del(&devnode->cdev);
-	device_unregister(&devnode->dev);
+	device_del(&devnode->dev);
 }
 
 /*
  *	Initialise media for linux
  */
-static int __init media_devnode_init(void)
+static int __init media_devnode_module_init(void)
 {
 	int ret;
 
@@ -312,14 +317,14 @@ static int __init media_devnode_init(void)
 	return 0;
 }
 
-static void __exit media_devnode_exit(void)
+static void __exit media_devnode_module_exit(void)
 {
 	bus_unregister(&media_bus_type);
 	unregister_chrdev_region(media_dev_t, MEDIA_NUM_DEVICES);
 }
 
-subsys_initcall(media_devnode_init);
-module_exit(media_devnode_exit)
+subsys_initcall(media_devnode_module_init);
+module_exit(media_devnode_module_exit)
 
 MODULE_AUTHOR("Laurent Pinchart <laurent.pinchart@ideasonboard.com>");
 MODULE_DESCRIPTION("Device node registration for media drivers");
diff --git a/include/media/media-devnode.h b/include/media/media-devnode.h
index a0f6823..68f4b2f 100644
--- a/include/media/media-devnode.h
+++ b/include/media/media-devnode.h
@@ -102,6 +102,17 @@ struct media_devnode {
 #define to_media_devnode(cd) container_of(cd, struct media_devnode, dev)
 
 /**
+ * media_devnode_init - initialise a media devnode
+ *
+ * @devnode: struct media_devnode we want to initialise
+ *
+ * Initialise a media devnode. Note that after initialising the media
+ * devnode is refcounted. Releasing references to it may be done using
+ * put_device().
+ */
+void media_devnode_init(struct media_devnode *devnode);
+
+/**
  * media_devnode_register - register a media device node
  *
  * @devnode: struct media_devnode we want to register a device node
@@ -111,11 +122,9 @@ struct media_devnode {
  * with the kernel. An error is returned if no free minor number can be found,
  * or if the registration of the device node fails.
  *
- * Zero is returned on success.
- *
- * Note that if the media_devnode_register call fails, the release() callback of
- * the media_devnode structure is *not* called, so the caller is responsible for
- * freeing any data.
+ * Zero is returned on success. Note that in case
+ * media_devnode_register() fails, the caller is responsible for
+ * releasing the reference to the device using put_device().
  */
 int __must_check media_devnode_register(struct media_devnode *devnode,
 					struct module *owner);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 10/21] media: Shuffle functions around
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (8 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 09/21] media: Split initialising and adding media devnode Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 11/21] media device: Refcount the media device Sakari Ailus
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

As the call paths of the functions in question will change, move them
around in anticipation of that. No other changes.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
---
 drivers/media/media-device.c | 56 ++++++++++++++++++++++----------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 9765031..3b96de5 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -662,6 +662,34 @@ void media_device_unregister_entity(struct media_entity *entity)
 }
 EXPORT_SYMBOL_GPL(media_device_unregister_entity);
 
+int __must_check media_device_register_entity_notify(struct media_device *mdev,
+					struct media_entity_notify *nptr)
+{
+	mutex_lock(&mdev->graph_mutex);
+	list_add_tail(&nptr->list, &mdev->entity_notify);
+	mutex_unlock(&mdev->graph_mutex);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(media_device_register_entity_notify);
+
+/*
+ * Note: Should be called with mdev->lock held.
+ */
+static void __media_device_unregister_entity_notify(struct media_device *mdev,
+					struct media_entity_notify *nptr)
+{
+	list_del(&nptr->list);
+}
+
+void media_device_unregister_entity_notify(struct media_device *mdev,
+					struct media_entity_notify *nptr)
+{
+	mutex_lock(&mdev->graph_mutex);
+	__media_device_unregister_entity_notify(mdev, nptr);
+	mutex_unlock(&mdev->graph_mutex);
+}
+EXPORT_SYMBOL_GPL(media_device_unregister_entity_notify);
+
 /**
  * media_device_init() - initialize a media device
  * @mdev:	The media device
@@ -745,34 +773,6 @@ out_put:
 }
 EXPORT_SYMBOL_GPL(__media_device_register);
 
-int __must_check media_device_register_entity_notify(struct media_device *mdev,
-					struct media_entity_notify *nptr)
-{
-	mutex_lock(&mdev->graph_mutex);
-	list_add_tail(&nptr->list, &mdev->entity_notify);
-	mutex_unlock(&mdev->graph_mutex);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(media_device_register_entity_notify);
-
-/*
- * Note: Should be called with mdev->lock held.
- */
-static void __media_device_unregister_entity_notify(struct media_device *mdev,
-					struct media_entity_notify *nptr)
-{
-	list_del(&nptr->list);
-}
-
-void media_device_unregister_entity_notify(struct media_device *mdev,
-					struct media_entity_notify *nptr)
-{
-	mutex_lock(&mdev->graph_mutex);
-	__media_device_unregister_entity_notify(mdev, nptr);
-	mutex_unlock(&mdev->graph_mutex);
-}
-EXPORT_SYMBOL_GPL(media_device_unregister_entity_notify);
-
 void media_device_unregister(struct media_device *mdev)
 {
 	struct media_entity *entity;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 11/21] media device: Refcount the media device
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (9 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 10/21] media: Shuffle functions around Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 12/21] media device: Initialise media devnode in media_device_init() Sakari Ailus
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

As the struct media_device embeds struct media_devnode, the lifetime of
that object must be that same than that of the media_device.

References are obtained by media_entity_get() and released by
media_entity_put(). In case a driver uses media_device_alloc() to allocate
its media device, it must release the media device by calling
media_device_put() rather than media_device_cleanup().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-device.c | 13 +++++++++++++
 include/media/media-device.h | 31 +++++++++++++++++++++++++++++++
 2 files changed, 44 insertions(+)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 3b96de5..5d3ec84 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -714,6 +714,17 @@ void media_device_init(struct media_device *mdev)
 }
 EXPORT_SYMBOL_GPL(media_device_init);
 
+static void media_device_release(struct media_devnode *devnode)
+{
+	struct media_device *mdev = to_media_device(devnode);
+
+	dev_dbg(devnode->parent, "Media device released\n");
+
+	media_device_cleanup(mdev);
+
+	kfree(mdev);
+}
+
 struct media_device *media_device_alloc(struct device *dev)
 {
 	struct media_device *mdev;
@@ -725,6 +736,8 @@ struct media_device *media_device_alloc(struct device *dev)
 	mdev->dev = dev;
 	media_device_init(mdev);
 
+	mdev->devnode.release = media_device_release;
+
 	return mdev;
 }
 EXPORT_SYMBOL_GPL(media_device_alloc);
diff --git a/include/media/media-device.h b/include/media/media-device.h
index 1fdfbd7..d29dec7 100644
--- a/include/media/media-device.h
+++ b/include/media/media-device.h
@@ -202,10 +202,39 @@ void media_device_init(struct media_device *mdev);
  * @dev:	The associated struct device pointer
  *
  * Allocate and initialise a media device. Returns a media device.
+ * The media device is refcounted, and this function returns a media
+ * device the refcount of which is one (1).
+ *
+ * References are taken and given using media_device_get() and
+ * media_device_put().
  */
 struct media_device *media_device_alloc(struct device *dev);
 
 /**
+ * media_device_get() - Get a reference to a media device
+ *
+ * mdev: media device
+ */
+#define media_device_get(mdev)						\
+	do {								\
+		dev_dbg((mdev)->dev, "%s: get media device %s\n",	\
+			__func__, (mdev)->bus_info);			\
+		get_device(&(mdev)->devnode.dev);			\
+	} while (0)
+
+/**
+ * media_device_put() - Put a reference to a media device
+ *
+ * mdev: media device
+ */
+#define media_device_put(mdev)						\
+	do {								\
+		dev_dbg((mdev)->dev, "%s: put media device %s\n",	\
+			__func__, (mdev)->bus_info);			\
+		put_device(&(mdev)->devnode.dev);			\
+	} while (0)
+
+/**
  * media_device_cleanup() - Cleanups a media device element
  *
  * @mdev:	pointer to struct &media_device
@@ -438,6 +467,8 @@ static inline struct media_device *media_device_alloc(struct device *dev)
 {
 	return NULL;
 }
+#define media_device_get(mdev) do { } while (0)
+#define media_device_put(mdev) do { } while (0)
 static inline int media_device_register(struct media_device *mdev)
 {
 	return 0;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 12/21] media device: Initialise media devnode in media_device_init()
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (10 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 11/21] media device: Refcount the media device Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 13/21] media device: Deprecate media_device_{init,cleanup}() for drivers Sakari Ailus
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

Call media_devnode_init() from media_device_init(). This has the effect of
creating a struct device for the media_devnode before it is registered,
making it possible to obtain a reference to it for e.g. video devices.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-device.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 5d3ec84..d534011 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -710,6 +710,8 @@ void media_device_init(struct media_device *mdev)
 	mutex_init(&mdev->graph_mutex);
 	ida_init(&mdev->entity_internal_idx);
 
+	media_devnode_init(&mdev->devnode);
+
 	dev_dbg(mdev->dev, "Media device initialized\n");
 }
 EXPORT_SYMBOL_GPL(media_device_init);
@@ -720,7 +722,10 @@ static void media_device_release(struct media_devnode *devnode)
 
 	dev_dbg(devnode->parent, "Media device released\n");
 
-	media_device_cleanup(mdev);
+	ida_destroy(&mdev->entity_internal_idx);
+	mdev->entity_internal_idx_max = 0;
+	media_entity_graph_walk_cleanup(&mdev->pm_count_walk);
+	mutex_destroy(&mdev->graph_mutex);
 
 	kfree(mdev);
 }
@@ -748,6 +753,7 @@ void media_device_cleanup(struct media_device *mdev)
 	mdev->entity_internal_idx_max = 0;
 	media_entity_graph_walk_cleanup(&mdev->pm_count_walk);
 	mutex_destroy(&mdev->graph_mutex);
+	media_device_put(mdev);
 }
 EXPORT_SYMBOL_GPL(media_device_cleanup);
 
@@ -763,26 +769,19 @@ int __must_check __media_device_register(struct media_device *mdev,
 	/* Set version 0 to indicate user-space that the graph is static */
 	mdev->topology_version = 0;
 
-	media_devnode_init(&mdev->devnode);
-
 	ret = media_devnode_register(&mdev->devnode, owner);
 	if (ret < 0)
-		goto out_put;
+		return ret;
 
 	ret = device_create_file(&mdev->devnode.dev, &dev_attr_model);
-	if (ret < 0)
-		goto out_unregister;
+	if (ret < 0) {
+		media_devnode_unregister(&mdev->devnode);
+		return ret;
+	}
 
 	dev_dbg(mdev->dev, "Media device registered\n");
 
 	return 0;
-
-out_unregister:
-	media_devnode_unregister(&mdev->devnode);
-out_put:
-	put_device(&mdev->devnode.dev);
-
-	return ret;
 }
 EXPORT_SYMBOL_GPL(__media_device_register);
 
@@ -825,7 +824,6 @@ void media_device_unregister(struct media_device *mdev)
 	device_remove_file(&mdev->devnode.dev, &dev_attr_model);
 	dev_dbg(mdev->dev, "Media device unregistering\n");
 	media_devnode_unregister(&mdev->devnode);
-	put_device(&mdev->devnode.dev);
 }
 EXPORT_SYMBOL_GPL(media_device_unregister);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 13/21] media device: Deprecate media_device_{init,cleanup}() for drivers
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (11 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 12/21] media device: Initialise media devnode in media_device_init() Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 14/21] media device: Get the media device driver's device Sakari Ailus
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

Drivers should no longer directly allocate media_device but rely on
media_device_alloc(), media_device_get() and media_device_put() instead.
Deprecate media_device_init() and media_device_cleanup().

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 include/media/media-device.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/media/media-device.h b/include/media/media-device.h
index d29dec7..a3d8dd4 100644
--- a/include/media/media-device.h
+++ b/include/media/media-device.h
@@ -193,6 +193,10 @@ static inline __must_check int media_entity_enum_init(
  * So drivers need to first initialize the media device, register any entity
  * within the media device, create pad to pad links and then finally register
  * the media device by calling media_device_register() as a final step.
+ *
+ * Note that using this function in drivers is DEPRECATED. New drivers
+ * must use media_device_alloc() and manage references using
+ * media_device_get() and media_device_put() instead.
  */
 void media_device_init(struct media_device *mdev);
 
@@ -241,6 +245,10 @@ struct media_device *media_device_alloc(struct device *dev);
  *
  * This function that will destroy the graph_mutex that is
  * initialized in media_device_init().
+ *
+ * Note that using this function in drivers is DEPRECATED. New drivers
+ * must use media_device_alloc() and manage references using
+ * media_device_get() and media_device_put() instead.
  */
 void media_device_cleanup(struct media_device *mdev);
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 14/21] media device: Get the media device driver's device
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (12 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 13/21] media device: Deprecate media_device_{init,cleanup}() for drivers Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 15/21] media: Provide a way to the driver to set a private pointer Sakari Ailus
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

The struct device of the media device driver (i.e. not that of the media
devnode) is pointed to by the media device. The struct device pointer is
mostly used for debug prints.

Ensure it will stay around as long as the media device does.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-device.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index d534011..8c08839 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -726,6 +726,7 @@ static void media_device_release(struct media_devnode *devnode)
 	mdev->entity_internal_idx_max = 0;
 	media_entity_graph_walk_cleanup(&mdev->pm_count_walk);
 	mutex_destroy(&mdev->graph_mutex);
+	put_device(mdev->dev);
 
 	kfree(mdev);
 }
@@ -734,9 +735,15 @@ struct media_device *media_device_alloc(struct device *dev)
 {
 	struct media_device *mdev;
 
+	dev = get_device(dev);
+	if (!dev)
+		return NULL;
+
 	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
-	if (!mdev)
+	if (!mdev) {
+		put_device(dev);
 		return NULL;
+	}
 
 	mdev->dev = dev;
 	media_device_init(mdev);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 15/21] media: Provide a way to the driver to set a private pointer
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (13 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 14/21] media device: Get the media device driver's device Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 16/21] media: Add release callback for media device Sakari Ailus
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

Now that the media device can be allocated dynamically, drivers have no
longer a way to conveniently obtain the driver private data structure.
Provide one again in the form of a private pointer passed to the
media_device_alloc() function.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
---
 drivers/media/media-device.c |  3 ++-
 include/media/media-device.h | 15 ++++++++++++++-
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 8c08839..5698823 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -731,7 +731,7 @@ static void media_device_release(struct media_devnode *devnode)
 	kfree(mdev);
 }
 
-struct media_device *media_device_alloc(struct device *dev)
+struct media_device *media_device_alloc(struct device *dev, void *priv)
 {
 	struct media_device *mdev;
 
@@ -747,6 +747,7 @@ struct media_device *media_device_alloc(struct device *dev)
 
 	mdev->dev = dev;
 	media_device_init(mdev);
+	mdev->priv = priv;
 
 	mdev->devnode.release = media_device_release;
 
diff --git a/include/media/media-device.h b/include/media/media-device.h
index a3d8dd4..9728d8a 100644
--- a/include/media/media-device.h
+++ b/include/media/media-device.h
@@ -52,6 +52,7 @@ struct media_entity_notify {
  * struct media_device - Media device
  * @dev:	Parent device
  * @devnode:	Media device node
+ * @priv:	A pointer to driver private data
  * @driver_name: Optional device driver name. If not set, calls to
  *		%MEDIA_IOC_DEVICE_INFO will return dev->driver->name.
  *		This is needed for USB drivers for example, as otherwise
@@ -117,6 +118,7 @@ struct media_device {
 	/* dev->driver_data points to this struct. */
 	struct device *dev;
 	struct media_devnode devnode;
+	void *priv;
 
 	char model[32];
 	char driver_name[32];
@@ -204,6 +206,7 @@ void media_device_init(struct media_device *mdev);
  * media_device_alloc() - Allocate and initialise a media device
  *
  * @dev:	The associated struct device pointer
+ * @priv:	pointer to a driver private data structure
  *
  * Allocate and initialise a media device. Returns a media device.
  * The media device is refcounted, and this function returns a media
@@ -212,7 +215,7 @@ void media_device_init(struct media_device *mdev);
  * References are taken and given using media_device_get() and
  * media_device_put().
  */
-struct media_device *media_device_alloc(struct device *dev);
+struct media_device *media_device_alloc(struct device *dev, void *priv);
 
 /**
  * media_device_get() - Get a reference to a media device
@@ -239,6 +242,16 @@ struct media_device *media_device_alloc(struct device *dev);
 	} while (0)
 
 /**
+ * media_device_priv() - Obtain the driver private pointer
+ *
+ * Returns a pointer passed to the media_device_alloc() function.
+ */
+static inline void *media_device_priv(struct media_device *mdev)
+{
+	return mdev->priv;
+}
+
+/**
  * media_device_cleanup() - Cleanups a media device element
  *
  * @mdev:	pointer to struct &media_device
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 16/21] media: Add release callback for media device
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (14 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 15/21] media: Provide a way to the driver to set a private pointer Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 17/21] v4l: Acquire a reference to the media device for every video device Sakari Ailus
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

The release callback may be used by the driver to signal the release of
the media device. This way the lifetime of the driver's own memory
allocations may be made dependent on that of the media device.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/media-device.c | 4 ++++
 include/media/media-device.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 5698823..82ae07a 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -726,6 +726,10 @@ static void media_device_release(struct media_devnode *devnode)
 	mdev->entity_internal_idx_max = 0;
 	media_entity_graph_walk_cleanup(&mdev->pm_count_walk);
 	mutex_destroy(&mdev->graph_mutex);
+
+	if (mdev->release)
+		mdev->release(mdev);
+
 	put_device(mdev->dev);
 
 	kfree(mdev);
diff --git a/include/media/media-device.h b/include/media/media-device.h
index 9728d8a..310640a 100644
--- a/include/media/media-device.h
+++ b/include/media/media-device.h
@@ -152,6 +152,7 @@ struct media_device {
 
 	int (*link_notify)(struct media_link *link, u32 flags,
 			   unsigned int notification);
+	void (*release)(struct media_device *mdev);
 };
 
 /* We don't need to include pci.h or usb.h here */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 17/21] v4l: Acquire a reference to the media device for every video device
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (15 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 16/21] media: Add release callback for media device Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 18/21] media-device: Postpone graph object removal until free Sakari Ailus
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

The video device depends on the existence of its media device --- if there
is one. Acquire a reference to it.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/v4l2-core/v4l2-dev.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
index e6da353..cda04ff 100644
--- a/drivers/media/v4l2-core/v4l2-dev.c
+++ b/drivers/media/v4l2-core/v4l2-dev.c
@@ -171,6 +171,9 @@ static void v4l2_device_release(struct device *cd)
 {
 	struct video_device *vdev = to_video_device(cd);
 	struct v4l2_device *v4l2_dev = vdev->v4l2_dev;
+#ifdef CONFIG_MEDIA_CONTROLLER
+	struct media_device *mdev = v4l2_dev->mdev;
+#endif
 
 	mutex_lock(&videodev_lock);
 	if (WARN_ON(video_device[vdev->minor] != vdev)) {
@@ -193,8 +196,8 @@ static void v4l2_device_release(struct device *cd)
 
 	mutex_unlock(&videodev_lock);
 
-#if defined(CONFIG_MEDIA_CONTROLLER)
-	if (v4l2_dev->mdev) {
+#ifdef CONFIG_MEDIA_CONTROLLER
+	if (mdev) {
 		/* Remove interfaces and interface links */
 		media_devnode_remove(vdev->intf_devnode);
 		if (vdev->entity.function != MEDIA_ENT_F_UNKNOWN)
@@ -220,6 +223,11 @@ static void v4l2_device_release(struct device *cd)
 	/* Decrease v4l2_device refcount */
 	if (v4l2_dev)
 		v4l2_device_put(v4l2_dev);
+
+#ifdef CONFIG_MEDIA_CONTROLLER
+	if (mdev)
+		media_device_put(mdev);
+#endif
 }
 
 static struct class video_class = {
@@ -808,6 +816,7 @@ static int video_register_media_controller(struct video_device *vdev, int type)
 
 	/* FIXME: how to create the other interface links? */
 
+	media_device_get(vdev->v4l2_dev->mdev);
 #endif
 	return 0;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 18/21] media-device: Postpone graph object removal until free
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (16 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 17/21] v4l: Acquire a reference to the media device for every video device Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 19/21] omap3isp: Allocate the media device dynamically Sakari Ailus
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

The media device itself will be unregistered based on it being unbound and
driver's remove callback being called. The graph objects themselves may
still be in use; rely on the media device release callback to release
them.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Acked-by: Hans Verkuil <hans.verkuil@cisco.com>
---
 drivers/media/media-device.c | 44 ++++++++++++++++++++------------------------
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 82ae07a..beb9372 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -719,6 +719,26 @@ EXPORT_SYMBOL_GPL(media_device_init);
 static void media_device_release(struct media_devnode *devnode)
 {
 	struct media_device *mdev = to_media_device(devnode);
+	struct media_entity *entity;
+	struct media_entity *next;
+	struct media_interface *intf, *tmp_intf;
+	struct media_entity_notify *notify, *nextp;
+
+	/* Remove all entities from the media device */
+	list_for_each_entry_safe(entity, next, &mdev->entities, graph_obj.list)
+		__media_device_unregister_entity(entity);
+
+	/* Remove all entity_notify callbacks from the media device */
+	list_for_each_entry_safe(notify, nextp, &mdev->entity_notify, list)
+		__media_device_unregister_entity_notify(mdev, notify);
+
+	/* Remove all interfaces from the media device */
+	list_for_each_entry_safe(intf, tmp_intf, &mdev->interfaces,
+				 graph_obj.list) {
+		__media_remove_intf_links(intf);
+		media_gobj_destroy(&intf->graph_obj);
+		kfree(intf);
+	}
 
 	dev_dbg(devnode->parent, "Media device released\n");
 
@@ -799,38 +819,14 @@ EXPORT_SYMBOL_GPL(__media_device_register);
 
 void media_device_unregister(struct media_device *mdev)
 {
-	struct media_entity *entity;
-	struct media_entity *next;
-	struct media_interface *intf, *tmp_intf;
-	struct media_entity_notify *notify, *nextp;
-
 	if (mdev == NULL)
 		return;
 
 	mutex_lock(&mdev->graph_mutex);
-
-	/* Check if mdev was ever registered at all */
 	if (!media_devnode_is_registered(&mdev->devnode)) {
 		mutex_unlock(&mdev->graph_mutex);
 		return;
 	}
-
-	/* Remove all entities from the media device */
-	list_for_each_entry_safe(entity, next, &mdev->entities, graph_obj.list)
-		__media_device_unregister_entity(entity);
-
-	/* Remove all entity_notify callbacks from the media device */
-	list_for_each_entry_safe(notify, nextp, &mdev->entity_notify, list)
-		__media_device_unregister_entity_notify(mdev, notify);
-
-	/* Remove all interfaces from the media device */
-	list_for_each_entry_safe(intf, tmp_intf, &mdev->interfaces,
-				 graph_obj.list) {
-		__media_remove_intf_links(intf);
-		media_gobj_destroy(&intf->graph_obj);
-		kfree(intf);
-	}
-
 	mutex_unlock(&mdev->graph_mutex);
 
 	device_remove_file(&mdev->devnode.dev, &dev_attr_model);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 19/21] omap3isp: Allocate the media device dynamically
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (17 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 18/21] media-device: Postpone graph object removal until free Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 20/21] omap3isp: Release the isp device struct by media device callback Sakari Ailus
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

Use the new media_device_alloc() API to allocate and release the media
device.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/platform/omap3isp/isp.c      | 24 +++++++++++++-----------
 drivers/media/platform/omap3isp/isp.h      |  2 +-
 drivers/media/platform/omap3isp/ispvideo.c |  2 +-
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c
index 5d54e2c..565d392 100644
--- a/drivers/media/platform/omap3isp/isp.c
+++ b/drivers/media/platform/omap3isp/isp.c
@@ -1597,8 +1597,8 @@ static void isp_unregister_entities(struct isp_device *isp)
 	omap3isp_stat_unregister_entities(&isp->isp_hist);
 
 	v4l2_device_unregister(&isp->v4l2_dev);
-	media_device_unregister(&isp->media_dev);
-	media_device_cleanup(&isp->media_dev);
+	media_device_unregister(isp->media_dev);
+	media_device_put(isp->media_dev);
 }
 
 static int isp_link_entity(
@@ -1676,14 +1676,16 @@ static int isp_register_entities(struct isp_device *isp)
 {
 	int ret;
 
-	isp->media_dev.dev = isp->dev;
-	strlcpy(isp->media_dev.model, "TI OMAP3 ISP",
-		sizeof(isp->media_dev.model));
-	isp->media_dev.hw_revision = isp->revision;
-	isp->media_dev.link_notify = v4l2_pipeline_link_notify;
-	media_device_init(&isp->media_dev);
+	isp->media_dev = media_device_alloc(isp->dev, isp);
+	if (!isp->media_dev)
+		return -ENOMEM;
+
+	strlcpy(isp->media_dev->model, "TI OMAP3 ISP",
+		sizeof(isp->media_dev->model));
+	isp->media_dev->hw_revision = isp->revision;
+	isp->media_dev->link_notify = v4l2_pipeline_link_notify;
 
-	isp->v4l2_dev.mdev = &isp->media_dev;
+	isp->v4l2_dev.mdev = isp->media_dev;
 	ret = v4l2_device_register(isp->dev, &isp->v4l2_dev);
 	if (ret < 0) {
 		dev_err(isp->dev, "%s: V4L2 device registration failed (%d)\n",
@@ -2161,7 +2163,7 @@ static int isp_subdev_notifier_complete(struct v4l2_async_notifier *async)
 	struct isp_bus_cfg *bus;
 	int ret;
 
-	ret = media_entity_enum_init(&isp->crashed, &isp->media_dev);
+	ret = media_entity_enum_init(&isp->crashed, isp->media_dev);
 	if (ret)
 		return ret;
 
@@ -2179,7 +2181,7 @@ static int isp_subdev_notifier_complete(struct v4l2_async_notifier *async)
 	if (ret < 0)
 		return ret;
 
-	return media_device_register(&isp->media_dev);
+	return media_device_register(isp->media_dev);
 }
 
 /*
diff --git a/drivers/media/platform/omap3isp/isp.h b/drivers/media/platform/omap3isp/isp.h
index 7e6f663..7378279 100644
--- a/drivers/media/platform/omap3isp/isp.h
+++ b/drivers/media/platform/omap3isp/isp.h
@@ -176,7 +176,7 @@ struct isp_xclk {
 struct isp_device {
 	struct v4l2_device v4l2_dev;
 	struct v4l2_async_notifier notifier;
-	struct media_device media_dev;
+	struct media_device *media_dev;
 	struct device *dev;
 	u32 revision;
 
diff --git a/drivers/media/platform/omap3isp/ispvideo.c b/drivers/media/platform/omap3isp/ispvideo.c
index 7d9f359..45ef38c 100644
--- a/drivers/media/platform/omap3isp/ispvideo.c
+++ b/drivers/media/platform/omap3isp/ispvideo.c
@@ -1077,7 +1077,7 @@ isp_video_streamon(struct file *file, void *fh, enum v4l2_buf_type type)
 	pipe = video->video.entity.pipe
 	     ? to_isp_pipeline(&video->video.entity) : &video->pipe;
 
-	ret = media_entity_enum_init(&pipe->ent_enum, &video->isp->media_dev);
+	ret = media_entity_enum_init(&pipe->ent_enum, video->isp->media_dev);
 	if (ret)
 		goto err_enum_init;
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 20/21] omap3isp: Release the isp device struct by media device callback
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (18 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 19/21] omap3isp: Allocate the media device dynamically Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-08-26 23:43 ` [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management Sakari Ailus
  2016-11-07 20:16 ` [RFC v3 00/21] Make use of kref in media device, grab references as needed Shuah Khan
  21 siblings, 0 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

Use the media device release callback to release the isp device's data
structure. This approach has the benefit of not releasing memory which may
still be accessed through open file handles whilst the isp driver is being
unbound.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/platform/omap3isp/isp.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c
index 565d392..689efe8 100644
--- a/drivers/media/platform/omap3isp/isp.c
+++ b/drivers/media/platform/omap3isp/isp.c
@@ -1672,6 +1672,8 @@ static int isp_link_entity(
 	return media_create_pad_link(entity, i, input, pad, flags);
 }
 
+static void isp_release(struct media_device *mdev);
+
 static int isp_register_entities(struct isp_device *isp)
 {
 	int ret;
@@ -1684,6 +1686,7 @@ static int isp_register_entities(struct isp_device *isp)
 		sizeof(isp->media_dev->model));
 	isp->media_dev->hw_revision = isp->revision;
 	isp->media_dev->link_notify = v4l2_pipeline_link_notify;
+	isp->media_dev->release = isp_release;
 
 	isp->v4l2_dev.mdev = isp->media_dev;
 	ret = v4l2_device_register(isp->dev, &isp->v4l2_dev);
@@ -1944,6 +1947,20 @@ static void isp_detach_iommu(struct isp_device *isp)
 	iommu_group_remove_device(isp->dev);
 }
 
+static void isp_release(struct media_device *mdev)
+{
+	struct isp_device *isp = media_device_priv(mdev);
+
+	isp_cleanup_modules(isp);
+	isp_xclk_cleanup(isp);
+
+	__omap3isp_get(isp, false);
+	isp_detach_iommu(isp);
+	__omap3isp_put(isp, false);
+
+	media_entity_enum_cleanup(&isp->crashed);
+}
+
 static int isp_attach_iommu(struct isp_device *isp)
 {
 	struct dma_iommu_mapping *mapping;
@@ -2004,14 +2021,6 @@ static int isp_remove(struct platform_device *pdev)
 
 	v4l2_async_notifier_unregister(&isp->notifier);
 	isp_unregister_entities(isp);
-	isp_cleanup_modules(isp);
-	isp_xclk_cleanup(isp);
-
-	__omap3isp_get(isp, false);
-	isp_detach_iommu(isp);
-	__omap3isp_put(isp, false);
-
-	media_entity_enum_cleanup(&isp->crashed);
 
 	return 0;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (19 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 20/21] omap3isp: Release the isp device struct by media device callback Sakari Ailus
@ 2016-08-26 23:43 ` Sakari Ailus
  2016-12-15 11:23   ` Laurent Pinchart
  2016-11-07 20:16 ` [RFC v3 00/21] Make use of kref in media device, grab references as needed Shuah Khan
  21 siblings, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2016-08-26 23:43 UTC (permalink / raw)
  To: linux-media, hverkuil; +Cc: mchehab, shuahkh, laurent.pinchart

devm functions are fine for managing resources that are directly related
to the device at hand and that have no other dependencies. However, a
process holding a file handle to a device created by a driver for a device
may result in the file handle left behind after the device is long gone.
This will result in accessing released (and potentially reallocated)
memory.

Instead, rely on the media device which will stick around until all users
are gone.

Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
---
 drivers/media/platform/omap3isp/isp.c         | 38 ++++++++++++++++++++-------
 drivers/media/platform/omap3isp/ispccp2.c     |  3 ++-
 drivers/media/platform/omap3isp/isph3a_aewb.c | 20 +++++++++-----
 drivers/media/platform/omap3isp/isph3a_af.c   | 20 +++++++++-----
 drivers/media/platform/omap3isp/isphist.c     |  5 ++--
 drivers/media/platform/omap3isp/ispstat.c     |  2 ++
 6 files changed, 63 insertions(+), 25 deletions(-)

diff --git a/drivers/media/platform/omap3isp/isp.c b/drivers/media/platform/omap3isp/isp.c
index 689efe8..262ddf7 100644
--- a/drivers/media/platform/omap3isp/isp.c
+++ b/drivers/media/platform/omap3isp/isp.c
@@ -1370,7 +1370,7 @@ static int isp_get_clocks(struct isp_device *isp)
 	unsigned int i;
 
 	for (i = 0; i < ARRAY_SIZE(isp_clocks); ++i) {
-		clk = devm_clk_get(isp->dev, isp_clocks[i]);
+		clk = clk_get(isp->dev, isp_clocks[i]);
 		if (IS_ERR(clk)) {
 			dev_err(isp->dev, "clk_get %s failed\n", isp_clocks[i]);
 			return PTR_ERR(clk);
@@ -1382,6 +1382,14 @@ static int isp_get_clocks(struct isp_device *isp)
 	return 0;
 }
 
+static void isp_put_clocks(struct isp_device *isp)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(isp_clocks); ++i)
+		clk_put(isp->clock[i]);
+}
+
 /*
  * omap3isp_get - Acquire the ISP resource.
  *
@@ -1596,7 +1604,6 @@ static void isp_unregister_entities(struct isp_device *isp)
 	omap3isp_stat_unregister_entities(&isp->isp_af);
 	omap3isp_stat_unregister_entities(&isp->isp_hist);
 
-	v4l2_device_unregister(&isp->v4l2_dev);
 	media_device_unregister(isp->media_dev);
 	media_device_put(isp->media_dev);
 }
@@ -1951,6 +1958,8 @@ static void isp_release(struct media_device *mdev)
 {
 	struct isp_device *isp = media_device_priv(mdev);
 
+	v4l2_device_unregister(&isp->v4l2_dev);
+
 	isp_cleanup_modules(isp);
 	isp_xclk_cleanup(isp);
 
@@ -1959,6 +1968,10 @@ static void isp_release(struct media_device *mdev)
 	__omap3isp_put(isp, false);
 
 	media_entity_enum_cleanup(&isp->crashed);
+
+	isp_put_clocks(isp);
+
+	kfree(isp);
 }
 
 static int isp_attach_iommu(struct isp_device *isp)
@@ -2211,7 +2224,7 @@ static int isp_probe(struct platform_device *pdev)
 	int ret;
 	int i, m;
 
-	isp = devm_kzalloc(&pdev->dev, sizeof(*isp), GFP_KERNEL);
+	isp = kzalloc(sizeof(*isp), GFP_KERNEL);
 	if (!isp) {
 		dev_err(&pdev->dev, "could not allocate memory\n");
 		return -ENOMEM;
@@ -2220,21 +2233,23 @@ static int isp_probe(struct platform_device *pdev)
 	ret = of_property_read_u32(pdev->dev.of_node, "ti,phy-type",
 				   &isp->phy_type);
 	if (ret)
-		return ret;
+		goto error_release_isp;
 
 	isp->syscon = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
 						      "syscon");
-	if (IS_ERR(isp->syscon))
-		return PTR_ERR(isp->syscon);
+	if (IS_ERR(isp->syscon)) {
+		ret = PTR_ERR(isp->syscon);
+		goto error_release_isp;
+	}
 
 	ret = of_property_read_u32_index(pdev->dev.of_node, "syscon", 1,
 					 &isp->syscon_offset);
 	if (ret)
-		return ret;
+		goto error_release_isp;
 
 	ret = isp_of_parse_nodes(&pdev->dev, &isp->notifier);
 	if (ret < 0)
-		return ret;
+		goto error_release_isp;
 
 	isp->autoidle = autoidle;
 
@@ -2251,8 +2266,8 @@ static int isp_probe(struct platform_device *pdev)
 	platform_set_drvdata(pdev, isp);
 
 	/* Regulators */
-	isp->isp_csiphy1.vdd = devm_regulator_get(&pdev->dev, "vdd-csiphy1");
-	isp->isp_csiphy2.vdd = devm_regulator_get(&pdev->dev, "vdd-csiphy2");
+	isp->isp_csiphy1.vdd = regulator_get(&pdev->dev, "vdd-csiphy1");
+	isp->isp_csiphy2.vdd = regulator_get(&pdev->dev, "vdd-csiphy2");
 
 	/* Clocks
 	 *
@@ -2384,6 +2399,9 @@ error_isp:
 	__omap3isp_put(isp, false);
 error:
 	mutex_destroy(&isp->isp_mutex);
+	isp_put_clocks(isp);
+error_release_isp:
+	kfree(isp);
 
 	return ret;
 }
diff --git a/drivers/media/platform/omap3isp/ispccp2.c b/drivers/media/platform/omap3isp/ispccp2.c
index ca09523..d49ce8a 100644
--- a/drivers/media/platform/omap3isp/ispccp2.c
+++ b/drivers/media/platform/omap3isp/ispccp2.c
@@ -1135,7 +1135,7 @@ int omap3isp_ccp2_init(struct isp_device *isp)
 	 * TODO: Don't hardcode the usage of PHY1 (shared with CSI2c).
 	 */
 	if (isp->revision == ISP_REVISION_2_0) {
-		ccp2->vdds_csib = devm_regulator_get(isp->dev, "vdds_csib");
+		ccp2->vdds_csib = regulator_get(isp->dev, "vdds_csib");
 		if (IS_ERR(ccp2->vdds_csib)) {
 			dev_dbg(isp->dev,
 				"Could not get regulator vdds_csib\n");
@@ -1163,4 +1163,5 @@ void omap3isp_ccp2_cleanup(struct isp_device *isp)
 
 	omap3isp_video_cleanup(&ccp2->video_in);
 	media_entity_cleanup(&ccp2->subdev.entity);
+	regulator_put(ccp2->vdds_csib);
 }
diff --git a/drivers/media/platform/omap3isp/isph3a_aewb.c b/drivers/media/platform/omap3isp/isph3a_aewb.c
index ccaf92f..130df8b 100644
--- a/drivers/media/platform/omap3isp/isph3a_aewb.c
+++ b/drivers/media/platform/omap3isp/isph3a_aewb.c
@@ -289,9 +289,10 @@ int omap3isp_h3a_aewb_init(struct isp_device *isp)
 {
 	struct ispstat *aewb = &isp->isp_aewb;
 	struct omap3isp_h3a_aewb_config *aewb_cfg;
-	struct omap3isp_h3a_aewb_config *aewb_recover_cfg;
+	struct omap3isp_h3a_aewb_config *aewb_recover_cfg = NULL;
+	int ret;
 
-	aewb_cfg = devm_kzalloc(isp->dev, sizeof(*aewb_cfg), GFP_KERNEL);
+	aewb_cfg = kzalloc(sizeof(*aewb_cfg), GFP_KERNEL);
 	if (!aewb_cfg)
 		return -ENOMEM;
 
@@ -301,12 +302,12 @@ int omap3isp_h3a_aewb_init(struct isp_device *isp)
 	aewb->isp = isp;
 
 	/* Set recover state configuration */
-	aewb_recover_cfg = devm_kzalloc(isp->dev, sizeof(*aewb_recover_cfg),
-					GFP_KERNEL);
+	aewb_recover_cfg = kzalloc(sizeof(*aewb_recover_cfg), GFP_KERNEL);
 	if (!aewb_recover_cfg) {
 		dev_err(aewb->isp->dev, "AEWB: cannot allocate memory for "
 					"recover configuration.\n");
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto err;
 	}
 
 	aewb_recover_cfg->saturation_limit = OMAP3ISP_AEWB_MAX_SATURATION_LIM;
@@ -323,13 +324,20 @@ int omap3isp_h3a_aewb_init(struct isp_device *isp)
 	if (h3a_aewb_validate_params(aewb, aewb_recover_cfg)) {
 		dev_err(aewb->isp->dev, "AEWB: recover configuration is "
 					"invalid.\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto err;
 	}
 
 	aewb_recover_cfg->buf_size = h3a_aewb_get_buf_size(aewb_recover_cfg);
 	aewb->recover_priv = aewb_recover_cfg;
 
 	return omap3isp_stat_init(aewb, "AEWB", &h3a_aewb_subdev_ops);
+
+err:
+	kfree(aewb_cfg);
+	kfree(aewb_recover_cfg);
+
+	return ret;
 }
 
 /*
diff --git a/drivers/media/platform/omap3isp/isph3a_af.c b/drivers/media/platform/omap3isp/isph3a_af.c
index 92937f7..7eecf97 100644
--- a/drivers/media/platform/omap3isp/isph3a_af.c
+++ b/drivers/media/platform/omap3isp/isph3a_af.c
@@ -352,9 +352,10 @@ int omap3isp_h3a_af_init(struct isp_device *isp)
 {
 	struct ispstat *af = &isp->isp_af;
 	struct omap3isp_h3a_af_config *af_cfg;
-	struct omap3isp_h3a_af_config *af_recover_cfg;
+	struct omap3isp_h3a_af_config *af_recover_cfg = NULL;
+	int ret;
 
-	af_cfg = devm_kzalloc(isp->dev, sizeof(*af_cfg), GFP_KERNEL);
+	af_cfg = kzalloc(sizeof(*af_cfg), GFP_KERNEL);
 	if (af_cfg == NULL)
 		return -ENOMEM;
 
@@ -364,12 +365,12 @@ int omap3isp_h3a_af_init(struct isp_device *isp)
 	af->isp = isp;
 
 	/* Set recover state configuration */
-	af_recover_cfg = devm_kzalloc(isp->dev, sizeof(*af_recover_cfg),
-				      GFP_KERNEL);
+	af_recover_cfg = kzalloc(sizeof(*af_recover_cfg), GFP_KERNEL);
 	if (!af_recover_cfg) {
 		dev_err(af->isp->dev, "AF: cannot allocate memory for recover "
 				      "configuration.\n");
-		return -ENOMEM;
+		ret = -ENOMEM;
+		goto err;
 	}
 
 	af_recover_cfg->paxel.h_start = OMAP3ISP_AF_PAXEL_HZSTART_MIN;
@@ -381,13 +382,20 @@ int omap3isp_h3a_af_init(struct isp_device *isp)
 	if (h3a_af_validate_params(af, af_recover_cfg)) {
 		dev_err(af->isp->dev, "AF: recover configuration is "
 				      "invalid.\n");
-		return -EINVAL;
+		ret = -EINVAL;
+		goto err;
 	}
 
 	af_recover_cfg->buf_size = h3a_af_get_buf_size(af_recover_cfg);
 	af->recover_priv = af_recover_cfg;
 
 	return omap3isp_stat_init(af, "AF", &h3a_af_subdev_ops);
+
+err:
+	kfree(af_cfg);
+	kfree(af_recover_cfg);
+
+	return ret;
 }
 
 void omap3isp_h3a_af_cleanup(struct isp_device *isp)
diff --git a/drivers/media/platform/omap3isp/isphist.c b/drivers/media/platform/omap3isp/isphist.c
index 7138b04..976cab0 100644
--- a/drivers/media/platform/omap3isp/isphist.c
+++ b/drivers/media/platform/omap3isp/isphist.c
@@ -477,9 +477,9 @@ int omap3isp_hist_init(struct isp_device *isp)
 {
 	struct ispstat *hist = &isp->isp_hist;
 	struct omap3isp_hist_config *hist_cfg;
-	int ret = -1;
+	int ret;
 
-	hist_cfg = devm_kzalloc(isp->dev, sizeof(*hist_cfg), GFP_KERNEL);
+	hist_cfg = kzalloc(sizeof(*hist_cfg), GFP_KERNEL);
 	if (hist_cfg == NULL)
 		return -ENOMEM;
 
@@ -517,6 +517,7 @@ int omap3isp_hist_init(struct isp_device *isp)
 	if (ret) {
 		if (hist->dma_ch)
 			dma_release_channel(hist->dma_ch);
+		kfree(hist_cfg);
 	}
 
 	return ret;
diff --git a/drivers/media/platform/omap3isp/ispstat.c b/drivers/media/platform/omap3isp/ispstat.c
index 1b9217d..1c1365f 100644
--- a/drivers/media/platform/omap3isp/ispstat.c
+++ b/drivers/media/platform/omap3isp/ispstat.c
@@ -1059,4 +1059,6 @@ void omap3isp_stat_cleanup(struct ispstat *stat)
 	mutex_destroy(&stat->ioctl_lock);
 	isp_stat_bufs_free(stat);
 	kfree(stat->buf);
+	kfree(stat->priv);
+	kfree(stat->recover_priv);
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
                   ` (20 preceding siblings ...)
  2016-08-26 23:43 ` [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management Sakari Ailus
@ 2016-11-07 20:16 ` Shuah Khan
  2016-11-08  8:19   ` Sakari Ailus
  21 siblings, 1 reply; 89+ messages in thread
From: Shuah Khan @ 2016-11-07 20:16 UTC (permalink / raw)
  To: Sakari Ailus, linux-media, hverkuil, mchehab, laurent.pinchart,
	Shuah Khan

Hi Sakari,

On 08/26/2016 05:43 PM, Sakari Ailus wrote:
> Hi folks,
> 
> This is the third version of the RFC set to fix referencing in media
> devices.
> 
> The lifetime of the media device (and media devnode) is now bound to that
> of struct device embedded in it and its memory is only released once the
> last reference is gone: unregistering is simply unregistering, it no
> longer should release memory which could be further accessed.
>                                                                                 
> A video node or a sub-device node also gets a reference to the media
> device, i.e. the release function of the video device node will release
> its reference to the media device. The same goes for file handles to the
> media device.
>                                                                                 
> As a side effect of this is that the media device, it is allocate together
> with the media devnode. The driver may also rely its own resources to the
> media device. Alternatively there's also a priv field to hold drivers
> private pointer (for container_of() is an option in this case). We could
> drop one of these options but currently both are possible.
>                                                                                 
> I've tested this by manually unbinding the omap3isp platform device while
> streaming. Driver changes are required for this to work; by not using
> dynamic allocation (i.e. media_device_alloc()) the old behaviour is still
> supported. This is still unlikely to be a grave problem as there are not
> that many device drivers that support physically removable devices. We've
> had this problem for other devices for many years without paying much
> notice --- that doesn't mean I don't think at least drivers for removable
> devices shouldn't be changed as part of the set later on, I'd just like to
> get review comments on the approach first.
>                                                                                 
> The three patches that originally partially resolved some of these issues
> are reverted in the beginning of the set. I'm still posting this as an RFC
> mainly since the testing is somewhat limited so far.

The main difference between the approach taken in these 3 reverted fixes and
this RFC series is as follows:

Reverted fixes:
- Fix the lifetime problem with the media devnode by dynamically allocating
  devnode instead of media_device. One of the main considerations to this
  approach is to isolate the changes in media core and avoid changes to
  drivers.
- I tested these fixes extensively and added selftests and README file for
  the regression tests. I haven't seen any problems after these fixes went
  in while physically removing au0828 device, em028xx, and uvcvideo

This RFC series:
- Dynamically allocates media_device
- This approach requires changes to drivers. It would be wise to not require
  churn to driver code and fix the problem in media-core.

Do you have information on the problems that still remain with the above fixes
in place? These fixes went into 4.8 is I recall correctly. Could you please
send us the list of problems and dmesg for the problems you found with the
above fixes and how this RFC series addresses them.

Can these problems be fixed without needing to change the approach in the
reverted patches?

I have a patch series on top of the fixes this RFC series is reverting
to allocate media_device only in the cases where sharing media device
is necessary. e.g: au0828 and snd-usb-audio.

Media Device Allocator API
https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html
https://www.mail-archive.com/linux-media@vger.kernel.org/msg97704.html

This series has been reviewed. The work I did to change snd-usb-audio to
use Media Controller API to coordinate access to resources with au0828
is dependent on the above patch series.

snip

> The to-do list includes changes to drivers that can be physically removed.
> Drivers not using the new API can mostly ignore these changes, albeit
> media_device_init() now grabs a reference to struct device of the media
> device which must be released.
> 

As I mentioned earlier, requiring changes to drivers there by exposing
the fix to all drivers is a problem with this RFC series. I would like
to understand the reasons why the current approach to allocate media
devnode and limit the changes to media-ocre doesn't work and also the
reasons why problems if any can't be fixed on top of these fixes.

I have a patch series on top of the fixes this RFC series is reverting
to allocate media_device only in the cases where sharing media device
is necessary. e.g: au0828 and snd-usb-audio.

Media Device Allocator API
https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html
https://www.mail-archive.com/linux-media@vger.kernel.org/msg97704.html

This series has been reviewed and pending at the moment because this
RFC series takes a different approach and reverts patches that the
above work depends on. The work I did to change snd-usb-audio to use
Media Controller API to coordinate access to resources with au0828
is dependent on the above patch series.

My work is kind of in the limbo now because of the conflict between the
two approaches and that my snd-usb-audio work depends on all of this.
Audio maintainer is waiting for snd-usb-audio patches to go in first and
use that work as a reference for changing other audio drivers to use the
Media Controller API.

I am hoping a reach consensus and move forward.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-07 20:16 ` [RFC v3 00/21] Make use of kref in media device, grab references as needed Shuah Khan
@ 2016-11-08  8:19   ` Sakari Ailus
  2016-11-09 16:49     ` Shuah Khan
  0 siblings, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2016-11-08  8:19 UTC (permalink / raw)
  To: Shuah Khan; +Cc: Sakari Ailus, linux-media, hverkuil, mchehab, laurent.pinchart

Hi Shuah,

On Mon, Nov 07, 2016 at 01:16:45PM -0700, Shuah Khan wrote:
> Hi Sakari,
> 
> On 08/26/2016 05:43 PM, Sakari Ailus wrote:
> > Hi folks,
> > 
> > This is the third version of the RFC set to fix referencing in media
> > devices.
> > 
> > The lifetime of the media device (and media devnode) is now bound to that
> > of struct device embedded in it and its memory is only released once the
> > last reference is gone: unregistering is simply unregistering, it no
> > longer should release memory which could be further accessed.
> >                                                                                 
> > A video node or a sub-device node also gets a reference to the media
> > device, i.e. the release function of the video device node will release
> > its reference to the media device. The same goes for file handles to the
> > media device.
> >                                                                                 
> > As a side effect of this is that the media device, it is allocate together
> > with the media devnode. The driver may also rely its own resources to the
> > media device. Alternatively there's also a priv field to hold drivers
> > private pointer (for container_of() is an option in this case). We could
> > drop one of these options but currently both are possible.
> >                                                                                 
> > I've tested this by manually unbinding the omap3isp platform device while
> > streaming. Driver changes are required for this to work; by not using
> > dynamic allocation (i.e. media_device_alloc()) the old behaviour is still
> > supported. This is still unlikely to be a grave problem as there are not
> > that many device drivers that support physically removable devices. We've
> > had this problem for other devices for many years without paying much
> > notice --- that doesn't mean I don't think at least drivers for removable
> > devices shouldn't be changed as part of the set later on, I'd just like to
> > get review comments on the approach first.
> >                                                                                 
> > The three patches that originally partially resolved some of these issues
> > are reverted in the beginning of the set. I'm still posting this as an RFC
> > mainly since the testing is somewhat limited so far.
> 
> The main difference between the approach taken in these 3 reverted fixes and
> this RFC series is as follows:
> 
> Reverted fixes:
> - Fix the lifetime problem with the media devnode by dynamically allocating
>   devnode instead of media_device. One of the main considerations to this
>   approach is to isolate the changes in media core and avoid changes to
>   drivers.
> - I tested these fixes extensively and added selftests and README file for
>   the regression tests. I haven't seen any problems after these fixes went
>   in while physically removing au0828 device, em028xx, and uvcvideo

I'd rather call them workarounds, as they do work around the issues rather
than properly fixing them. This approach isn't really extensible to fix the
remaining problems either. It is true that *some* of the issues that were
present before these patches do not show up anymore with them, but we really
do need to fix all of these bugs.

The underlying problem is that there may be opened file handles, references
from elsewhere in the kernel or such to in-memory objects that are not
refcounted properly: referencing released memory is a no-go in kernel.

> 
> This RFC series:
> - Dynamically allocates media_device
> - This approach requires changes to drivers. It would be wise to not require
>   churn to driver code and fix the problem in media-core.
> 
> Do you have information on the problems that still remain with the above fixes
> in place? These fixes went into 4.8 is I recall correctly. Could you please
> send us the list of problems and dmesg for the problems you found with the
> above fixes and how this RFC series addresses them.

Just try removing a device when it's streaming. No more than that is needed.

This is one of the bugs fixed by the patchset, albeit drivers do need to be
changed as well to benefit from the changes. 

> 
> Can these problems be fixed without needing to change the approach in the
> reverted patches?

I don't think it's feasible, really. Besides, the workaround were rather
ugly and were merged only since there was a said urgency to have a partial
fix early. See above as well.

> 
> I have a patch series on top of the fixes this RFC series is reverting
> to allocate media_device only in the cases where sharing media device
> is necessary. e.g: au0828 and snd-usb-audio.
> 
> Media Device Allocator API
> https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html
> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97704.html
> 
> This series has been reviewed. The work I did to change snd-usb-audio to
> use Media Controller API to coordinate access to resources with au0828
> is dependent on the above patch series.
> 
> snip
> 
> > The to-do list includes changes to drivers that can be physically removed.
> > Drivers not using the new API can mostly ignore these changes, albeit
> > media_device_init() now grabs a reference to struct device of the media
> > device which must be released.
> > 
> 
> As I mentioned earlier, requiring changes to drivers there by exposing
> the fix to all drivers is a problem with this RFC series. I would like
> to understand the reasons why the current approach to allocate media
> devnode and limit the changes to media-ocre doesn't work and also the
> reasons why problems if any can't be fixed on top of these fixes.

It's all about references and releasing resources and performing cleanup at
the right time. There are a number of cleanup patches as well to prepare for
the changes. Please see individual patches for detailed information.

The vast majority of the drivers does this wrong to begin with so it's not
possible to fix the referencing problems without driver changes, the most
common issue being that drivers allocate memory using devm_*() functions.

> 
> I have a patch series on top of the fixes this RFC series is reverting
> to allocate media_device only in the cases where sharing media device
> is necessary. e.g: au0828 and snd-usb-audio.
> 
> Media Device Allocator API
> https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html
> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97704.html

Could you rebase the patches on this set? I'll resend the rebased set in a
moment.

> 
> This series has been reviewed and pending at the moment because this
> RFC series takes a different approach and reverts patches that the
> above work depends on. The work I did to change snd-usb-audio to use
> Media Controller API to coordinate access to resources with au0828
> is dependent on the above patch series.
> 
> My work is kind of in the limbo now because of the conflict between the
> two approaches and that my snd-usb-audio work depends on all of this.
> Audio maintainer is waiting for snd-usb-audio patches to go in first and
> use that work as a reference for changing other audio drivers to use the
> Media Controller API.
> 
> I am hoping a reach consensus and move forward.

I'll be around on #v4l today if you'd like to chat.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-08  8:19   ` Sakari Ailus
@ 2016-11-09 16:49     ` Shuah Khan
  2016-11-09 17:00       ` Shuah Khan
  0 siblings, 1 reply; 89+ messages in thread
From: Shuah Khan @ 2016-11-09 16:49 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Sakari Ailus, linux-media, hverkuil, mchehab, laurent.pinchart,
	Shuah Khan

On 11/08/2016 01:19 AM, Sakari Ailus wrote:
> Hi Shuah,
> 
> On Mon, Nov 07, 2016 at 01:16:45PM -0700, Shuah Khan wrote:
>> Hi Sakari,
>>
>> On 08/26/2016 05:43 PM, Sakari Ailus wrote:
>>> Hi folks,
>>>
>>> This is the third version of the RFC set to fix referencing in media
>>> devices.
>>>
>>> The lifetime of the media device (and media devnode) is now bound to that
>>> of struct device embedded in it and its memory is only released once the
>>> last reference is gone: unregistering is simply unregistering, it no
>>> longer should release memory which could be further accessed.
>>>                                                                                 
>>> A video node or a sub-device node also gets a reference to the media
>>> device, i.e. the release function of the video device node will release
>>> its reference to the media device. The same goes for file handles to the
>>> media device.
>>>                                                                                 
>>> As a side effect of this is that the media device, it is allocate together
>>> with the media devnode. The driver may also rely its own resources to the
>>> media device. Alternatively there's also a priv field to hold drivers
>>> private pointer (for container_of() is an option in this case). We could
>>> drop one of these options but currently both are possible.
>>>                                                                                 
>>> I've tested this by manually unbinding the omap3isp platform device while
>>> streaming. Driver changes are required for this to work; by not using
>>> dynamic allocation (i.e. media_device_alloc()) the old behaviour is still
>>> supported. This is still unlikely to be a grave problem as there are not
>>> that many device drivers that support physically removable devices. We've
>>> had this problem for other devices for many years without paying much
>>> notice --- that doesn't mean I don't think at least drivers for removable
>>> devices shouldn't be changed as part of the set later on, I'd just like to
>>> get review comments on the approach first.
>>>                                                                                 
>>> The three patches that originally partially resolved some of these issues
>>> are reverted in the beginning of the set. I'm still posting this as an RFC
>>> mainly since the testing is somewhat limited so far.
>>
>> The main difference between the approach taken in these 3 reverted fixes and
>> this RFC series is as follows:
>>
>> Reverted fixes:
>> - Fix the lifetime problem with the media devnode by dynamically allocating
>>   devnode instead of media_device. One of the main considerations to this
>>   approach is to isolate the changes in media core and avoid changes to
>>   drivers.
>> - I tested these fixes extensively and added selftests and README file for
>>   the regression tests. I haven't seen any problems after these fixes went
>>   in while physically removing au0828 device, em028xx, and uvcvideo
> 
> I'd rather call them workarounds, as they do work around the issues rather
> than properly fixing them. This approach isn't really extensible to fix the
> remaining problems either. It is true that *some* of the issues that were
> present before these patches do not show up anymore with them, but we really
> do need to fix all of these bugs.
> 
> The underlying problem is that there may be opened file handles, references
> from elsewhere in the kernel or such to in-memory objects that are not
> refcounted properly: referencing released memory is a no-go in kernel.
> 
>>
>> This RFC series:
>> - Dynamically allocates media_device
>> - This approach requires changes to drivers. It would be wise to not require
>>   churn to driver code and fix the problem in media-core.
>>
>> Do you have information on the problems that still remain with the above fixes
>> in place? These fixes went into 4.8 is I recall correctly. Could you please
>> send us the list of problems and dmesg for the problems you found with the
>> above fixes and how this RFC series addresses them.
> 
> Just try removing a device when it's streaming. No more than that is needed.
> 
> This is one of the bugs fixed by the patchset, albeit drivers do need to be
> changed as well to benefit from the changes. 
> 
>>
>> Can these problems be fixed without needing to change the approach in the
>> reverted patches?
> 
> I don't think it's feasible, really. Besides, the workaround were rather
> ugly and were merged only since there was a said urgency to have a partial
> fix early. See above as well.
> 
>>
>> I have a patch series on top of the fixes this RFC series is reverting
>> to allocate media_device only in the cases where sharing media device
>> is necessary. e.g: au0828 and snd-usb-audio.
>>
>> Media Device Allocator API
>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html
>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97704.html
>>
>> This series has been reviewed. The work I did to change snd-usb-audio to
>> use Media Controller API to coordinate access to resources with au0828
>> is dependent on the above patch series.
>>
>> snip
>>
>>> The to-do list includes changes to drivers that can be physically removed.
>>> Drivers not using the new API can mostly ignore these changes, albeit
>>> media_device_init() now grabs a reference to struct device of the media
>>> device which must be released.
>>>
>>

Can you send me the dmesg for this problem? I think this issue is a generic
issue with videoDev release path and is independent of whether or not the
driver uses media coontroller api.

>> As I mentioned earlier, requiring changes to drivers there by exposing
>> the fix to all drivers is a problem with this RFC series. I would like
>> to understand the reasons why the current approach to allocate media
>> devnode and limit the changes to media-ocre doesn't work and also the
>> reasons why problems if any can't be fixed on top of these fixes.
> 
> It's all about references and releasing resources and performing cleanup at
> the right time. There are a number of cleanup patches as well to prepare for
> the changes. Please see individual patches for detailed information.
> 
> The vast majority of the drivers does this wrong to begin with so it's not
> possible to fix the referencing problems without driver changes, the most
> common issue being that drivers allocate memory using devm_*() functions.
> 
>>
>> I have a patch series on top of the fixes this RFC series is reverting
>> to allocate media_device only in the cases where sharing media device
>> is necessary. e.g: au0828 and snd-usb-audio.
>>
>> Media Device Allocator API
>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html
>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97704.html
> 
> Could you rebase the patches on this set? I'll resend the rebased set in a
> moment.
> 
>>
>> This series has been reviewed and pending at the moment because this
>> RFC series takes a different approach and reverts patches that the
>> above work depends on. The work I did to change snd-usb-audio to use
>> Media Controller API to coordinate access to resources with au0828
>> is dependent on the above patch series.
>>
>> My work is kind of in the limbo now because of the conflict between the
>> two approaches and that my snd-usb-audio work depends on all of this.
>> Audio maintainer is waiting for snd-usb-audio patches to go in first and
>> use that work as a reference for changing other audio drivers to use the
>> Media Controller API.
>>
>> I am hoping a reach consensus and move forward.
> 
> I'll be around on #v4l today if you'd like to chat.
> 

We can chat on irc. Also, media_device needs to be sharable across drivers
for snd-usb-audio and au0828 to share it. In your RFC series, media_device
isn't sharable. Would it be possible for you to take a look at the Media
Device Allocator API patches I sent out and see if you can do your work
on top of those.

Maybe we can get the Media Device Allocator API work in and then we can
get your RFC series in after that. Here is what I propose:

- Keep the fixes in 4.9
- Get Media Device Allocator API patches into 4.9.
- snd-usb-auido work go into 4.10

Then your RFC series could go in. I am looking at the RFC series and that
the drivers need to change as well, so this RFC work could take longer.
Since we have to make media_device sharable, it is necessary to have a
global list approach Media Device Allocator API takes. So it is possible
for your RFC series to go on top of the Media Device Allocator API.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-09 16:49     ` Shuah Khan
@ 2016-11-09 17:00       ` Shuah Khan
  2016-11-09 17:46         ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 89+ messages in thread
From: Shuah Khan @ 2016-11-09 17:00 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Sakari Ailus, linux-media, hverkuil, mchehab, laurent.pinchart,
	Shuah Khan

On 11/09/2016 09:49 AM, Shuah Khan wrote:
> On 11/08/2016 01:19 AM, Sakari Ailus wrote:
>> Hi Shuah,
>>
>> On Mon, Nov 07, 2016 at 01:16:45PM -0700, Shuah Khan wrote:
>>> Hi Sakari,
>>>
>>> On 08/26/2016 05:43 PM, Sakari Ailus wrote:
>>>> Hi folks,
>>>>
>>>> This is the third version of the RFC set to fix referencing in media
>>>> devices.
>>>>
>>>> The lifetime of the media device (and media devnode) is now bound to that
>>>> of struct device embedded in it and its memory is only released once the
>>>> last reference is gone: unregistering is simply unregistering, it no
>>>> longer should release memory which could be further accessed.
>>>>                                                                                 
>>>> A video node or a sub-device node also gets a reference to the media
>>>> device, i.e. the release function of the video device node will release
>>>> its reference to the media device. The same goes for file handles to the
>>>> media device.
>>>>                                                                                 
>>>> As a side effect of this is that the media device, it is allocate together
>>>> with the media devnode. The driver may also rely its own resources to the
>>>> media device. Alternatively there's also a priv field to hold drivers
>>>> private pointer (for container_of() is an option in this case). We could
>>>> drop one of these options but currently both are possible.
>>>>                                                                                 
>>>> I've tested this by manually unbinding the omap3isp platform device while
>>>> streaming. Driver changes are required for this to work; by not using
>>>> dynamic allocation (i.e. media_device_alloc()) the old behaviour is still
>>>> supported. This is still unlikely to be a grave problem as there are not
>>>> that many device drivers that support physically removable devices. We've
>>>> had this problem for other devices for many years without paying much
>>>> notice --- that doesn't mean I don't think at least drivers for removable
>>>> devices shouldn't be changed as part of the set later on, I'd just like to
>>>> get review comments on the approach first.
>>>>                                                                                 
>>>> The three patches that originally partially resolved some of these issues
>>>> are reverted in the beginning of the set. I'm still posting this as an RFC
>>>> mainly since the testing is somewhat limited so far.
>>>
>>> The main difference between the approach taken in these 3 reverted fixes and
>>> this RFC series is as follows:
>>>
>>> Reverted fixes:
>>> - Fix the lifetime problem with the media devnode by dynamically allocating
>>>   devnode instead of media_device. One of the main considerations to this
>>>   approach is to isolate the changes in media core and avoid changes to
>>>   drivers.
>>> - I tested these fixes extensively and added selftests and README file for
>>>   the regression tests. I haven't seen any problems after these fixes went
>>>   in while physically removing au0828 device, em028xx, and uvcvideo
>>
>> I'd rather call them workarounds, as they do work around the issues rather
>> than properly fixing them. This approach isn't really extensible to fix the
>> remaining problems either. It is true that *some* of the issues that were
>> present before these patches do not show up anymore with them, but we really
>> do need to fix all of these bugs.
>>
>> The underlying problem is that there may be opened file handles, references
>> from elsewhere in the kernel or such to in-memory objects that are not
>> refcounted properly: referencing released memory is a no-go in kernel.
>>
>>>
>>> This RFC series:
>>> - Dynamically allocates media_device
>>> - This approach requires changes to drivers. It would be wise to not require
>>>   churn to driver code and fix the problem in media-core.
>>>
>>> Do you have information on the problems that still remain with the above fixes
>>> in place? These fixes went into 4.8 is I recall correctly. Could you please
>>> send us the list of problems and dmesg for the problems you found with the
>>> above fixes and how this RFC series addresses them.
>>
>> Just try removing a device when it's streaming. No more than that is needed.
>>
>> This is one of the bugs fixed by the patchset, albeit drivers do need to be
>> changed as well to benefit from the changes. 
>>
>>>
>>> Can these problems be fixed without needing to change the approach in the
>>> reverted patches?
>>
>> I don't think it's feasible, really. Besides, the workaround were rather
>> ugly and were merged only since there was a said urgency to have a partial
>> fix early. See above as well.
>>
>>>
>>> I have a patch series on top of the fixes this RFC series is reverting
>>> to allocate media_device only in the cases where sharing media device
>>> is necessary. e.g: au0828 and snd-usb-audio.
>>>
>>> Media Device Allocator API
>>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
>>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html
>>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97704.html
>>>
>>> This series has been reviewed. The work I did to change snd-usb-audio to
>>> use Media Controller API to coordinate access to resources with au0828
>>> is dependent on the above patch series.
>>>
>>> snip
>>>
>>>> The to-do list includes changes to drivers that can be physically removed.
>>>> Drivers not using the new API can mostly ignore these changes, albeit
>>>> media_device_init() now grabs a reference to struct device of the media
>>>> device which must be released.
>>>>
>>>
> 
> Can you send me the dmesg for this problem? I think this issue is a generic
> issue with videoDev release path and is independent of whether or not the
> driver uses media coontroller api.
> 
>>> As I mentioned earlier, requiring changes to drivers there by exposing
>>> the fix to all drivers is a problem with this RFC series. I would like
>>> to understand the reasons why the current approach to allocate media
>>> devnode and limit the changes to media-ocre doesn't work and also the
>>> reasons why problems if any can't be fixed on top of these fixes.
>>
>> It's all about references and releasing resources and performing cleanup at
>> the right time. There are a number of cleanup patches as well to prepare for
>> the changes. Please see individual patches for detailed information.
>>
>> The vast majority of the drivers does this wrong to begin with so it's not
>> possible to fix the referencing problems without driver changes, the most
>> common issue being that drivers allocate memory using devm_*() functions.
>>
>>>
>>> I have a patch series on top of the fixes this RFC series is reverting
>>> to allocate media_device only in the cases where sharing media device
>>> is necessary. e.g: au0828 and snd-usb-audio.
>>>
>>> Media Device Allocator API
>>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg98793.html
>>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97779.html
>>> https://www.mail-archive.com/linux-media@vger.kernel.org/msg97704.html
>>
>> Could you rebase the patches on this set? I'll resend the rebased set in a
>> moment.
>>
>>>
>>> This series has been reviewed and pending at the moment because this
>>> RFC series takes a different approach and reverts patches that the
>>> above work depends on. The work I did to change snd-usb-audio to use
>>> Media Controller API to coordinate access to resources with au0828
>>> is dependent on the above patch series.
>>>
>>> My work is kind of in the limbo now because of the conflict between the
>>> two approaches and that my snd-usb-audio work depends on all of this.
>>> Audio maintainer is waiting for snd-usb-audio patches to go in first and
>>> use that work as a reference for changing other audio drivers to use the
>>> Media Controller API.
>>>
>>> I am hoping a reach consensus and move forward.
>>
>> I'll be around on #v4l today if you'd like to chat.
>>
> 
> We can chat on irc. Also, media_device needs to be sharable across drivers
> for snd-usb-audio and au0828 to share it. In your RFC series, media_device
> isn't sharable. Would it be possible for you to take a look at the Media
> Device Allocator API patches I sent out and see if you can do your work
> on top of those.
> 
> Maybe we can get the Media Device Allocator API work in and then we can
> get your RFC series in after that. Here is what I propose:
> 
> - Keep the fixes in 4.9
> - Get Media Device Allocator API patches into 4.9.

I meant 4.10 not 4.9

> - snd-usb-auido work go into 4.10
> 
> Then your RFC series could go in. I am looking at the RFC series and that
> the drivers need to change as well, so this RFC work could take longer.
> Since we have to make media_device sharable, it is necessary to have a
> global list approach Media Device Allocator API takes. So it is possible
> for your RFC series to go on top of the Media Device Allocator API.
> 
> thanks,
> -- Shuah
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-09 17:00       ` Shuah Khan
@ 2016-11-09 17:46         ` Mauro Carvalho Chehab
  2016-11-14 13:27           ` Sakari Ailus
  0 siblings, 1 reply; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-11-09 17:46 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Sakari Ailus, Sakari Ailus, linux-media, hverkuil, laurent.pinchart

Em Wed, 9 Nov 2016 10:00:58 -0700
Shuah Khan <shuahkh@osg.samsung.com> escreveu:

> > Maybe we can get the Media Device Allocator API work in and then we can
> > get your RFC series in after that. Here is what I propose:
> > 
> > - Keep the fixes in 4.9

Fixes should always be kept. Reverting a fix is not an option.
Instead, do incremental patches on the top of it.

> > - Get Media Device Allocator API patches into 4.9.  
> 
> I meant 4.10 not 4.9
> 
> > - snd-usb-auido work go into 4.10

Sounds like a plan.

> > Then your RFC series could go in. I am looking at the RFC series and that
> > the drivers need to change as well, so this RFC work could take longer.
> > Since we have to make media_device sharable, it is necessary to have a
> > global list approach Media Device Allocator API takes. So it is possible
> > for your RFC series to go on top of the Media Device Allocator API.

Firstly, the RFC series should be converted into something that can
be applicable upstream, e. g.:

- doing the changes over the top of upstream, instead of needing to
  revert patches;

- change all drivers as the kAPI changes;

- be git bisectable, e. g. all patches should compile and run fine
  after each single patch, without introducing regressions.

That probably means that the series should be tested not only on
omap3, but also on some other device drivers.


-- 
Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-09 17:46         ` Mauro Carvalho Chehab
@ 2016-11-14 13:27           ` Sakari Ailus
  2016-11-22 17:44             ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2016-11-14 13:27 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shuah Khan, Sakari Ailus, linux-media, hverkuil, laurent.pinchart

Hi Mauro,

I'm replying below but let me first summarise the remaining problem area
that this patchset addresses.

The problems you and Shuah have seen and partially addressed are related to
a larger picture which is the lifetime of (mostly) memory resources related
to various objects used by as well both the Media controller and V4L2
frameworks (including videobuf2) as the drivers which make use of these
frameworks.

The Media controller and V4L2 interfaces exposed by drivers consist of
multiple devices nodes, data structures with interdependencies within the
frameworks themselves and dependencies from the driver's own data structures
towards the framework data structures. The Media device and the media graph
objects are central to the problem area as well.

So what are the issues then? Until now, we've attempted to regulate the
users' ability to access the devices at the time they're being unregistered
(and the associated memory released), but that approach does not really
scale: you have to make sure that the unregistering also will not take place
_during_ the system call --- not just in the beginning of it.

The media graph contains media graph objects, some of which are media
entities (contained in struct video_device or struct v4l2_subdev, for
instance). Media entities as graph nodes have links to other entities. In
order to implement the system calls, the drivers do parse this graph in
order to obtain information they need to obtain from it. For instance, it's
not uncommon for an implementation for video node format enumeration to
figure out which sub-device the link from that video nodes leads to. Drivers
may also have similar paths they follow.

Interrupt handling may also be taking place during the device removal during
which a number of data structures are now freed. This really does call for a
solution based on reference counting.

This leads to the conclusion that all the memory resources that could be
accessed by the drivers or the kernel frameworks must stay intact until the
last file handle to the said devices is closed. Otherwise, there is a
possibility of accessing released memory.

Right now in a lot of the cases, such as for video device and sub-device
nodes, we do release the memory when a device (as in struct device) is being
unregistered. There simply is in the current mainline kernel a way to do
this in a safe way. Drivers do use devm_() family of functions to allocate
the memory of the media graph object and their internal data structures.

With this patchset:

- The media_device which again contains the media_devnode is allocated
  dynamically. The lifetime of the media device --- and the media graph
  objects it contains --- is bound to device nodes that are bound to the
  media device (video and sub-device nodes) as well as open file handles.

- Care is taken that the unregistration process and releasing memory happens
  in the right order. This was not always the case previously.

- The driver remains responsible for the memory of the video and sub-device
  nodes. However, now the Media controller provides a convenient callback to
  the driver to release any memory resources when the time has come to do
  so. This takes place just before the media device memory is released.

- Drivers that do not strictly need to be removable require no changes. The
  benefits of this set become tangible for any driver by changing how the
  driver allocates memory for the data structures. Ideally at least
  drivers for hot-removable devices should be converted.

In order to make the current drivers to behave well it is necessary to make
changes to how memory is allocated in the drivers. If you look at the sample
patches that are part of the set for the omap3isp driver, you'll find that
around 95% of the changes are related to removing the user of devm_() family
of functions instead of Media controller API changes. In this regard, the
approach taken here requires very little if any additional overhead.

On Wed, Nov 09, 2016 at 03:46:08PM -0200, Mauro Carvalho Chehab wrote:
> Em Wed, 9 Nov 2016 10:00:58 -0700
> Shuah Khan <shuahkh@osg.samsung.com> escreveu:
> 
> > > Maybe we can get the Media Device Allocator API work in and then we can
> > > get your RFC series in after that. Here is what I propose:
> > > 
> > > - Keep the fixes in 4.9
> 
> Fixes should always be kept. Reverting a fix is not an option.
> Instead, do incremental patches on the top of it.
> 
> > > - Get Media Device Allocator API patches into 4.9.  
> > 
> > I meant 4.10 not 4.9
> > 
> > > - snd-usb-auido work go into 4.10
> 
> Sounds like a plan.
> 
> > > Then your RFC series could go in. I am looking at the RFC series and that
> > > the drivers need to change as well, so this RFC work could take longer.
> > > Since we have to make media_device sharable, it is necessary to have a
> > > global list approach Media Device Allocator API takes. So it is possible
> > > for your RFC series to go on top of the Media Device Allocator API.
> 
> Firstly, the RFC series should be converted into something that can
> be applicable upstream, e. g.:
> 
> - doing the changes over the top of upstream, instead of needing to
>   revert patches;

The patches are in fact on top of the current media-tree, or were when they
were sent (v4).

The reason I'm reverting patches is that the reason why these patches were
merged was not because they would have been a sound way forward for the
Media controller framework, but because they partially worked around issues
in a device being in use while it was removed.

They never were a complete fix for these problems nor I do think they could
be extended to be such. There were also unaddressed issues in these patches
pointed out during the review. For these reasons I'm reverting the three
patches. In more detail:

* media: fix media devnode ioctl/syscall and unregister race
  6f0dd24a084a

The patch clears the registered bit before performing the steps related to
unregistering a media device, but the bit is checked only at the beginning
of the IOCTL call. As unregistering a device and an IOCTL call on a file
handle of that device are not serialised, nothing guarantees the IOCTL call
will finish with the registered bit still in the same state. Serialising the
two e.g. by using a mutex is hardly a feasible solution for this.

I may have pointed out the original problem but this is not the solution.

<URL:http://www.spinics.net/lists/linux-media/msg101295.html>

The right solution is instead to make sure the data structures related to
the media device will not disappear while the IOCTL call is in progress (at
least).

* media: fix use-after-free in cdev_put() when app exits after driver unbind
  5b28dde51d0c

The patch avoids the problem of deleting a character device (cdev_del())
after its memory has been released. The change is sound as such but the
problem is addressed by another, a lot more simple patch in my series:

<URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=26fa8c1a3df5859d34cef8ef953e3a29a432a17b>

It might be possible to reasonably continue from here if the next patch to
be reverted did not depend on this one.

* media-device: dynamically allocate struct media_devnode

This creates a two-way dependency between struct media_devnode and
media_device. This is very much against the original design which clearly
separates the two: media_devnode is entirely independent of media_device.

The original intent was that another sub-system in the kernel such as the
V4L2 could make use of media_devnode as well and while that hasn't happened,
perhaps the two could be merged. There simply are no other reasons to keep
the two structs separate.

The patch is certainly a workaround, as it (partially, again) works around
issues in timing of releasing memory and accessing it.

The proper solutions regarding the media_device and media_devnode are either
maintain the separation or unify the two, and this patch does nor suggests
either of these. To the contrary: it makes either of these impossible by
design, and this reason alone is enough to revert it.

The set I'm pushing maintains the separation and leaves the option of either
merging the two (media_device and media_devnode) or making use of
media_devnode elsewhere open.

> - change all drivers as the kAPI changes;

The patchset actually adds new APIs rather than changing the OLD one --- as
the old one was simply that drivers were responsible for allocating the data
structures related to a media device. Existing drivers should continue to
work as they did before without changes.

Naturally, to get full benetifs of the changes, driver changes will be also
required (see the beginning of the message).

The set has been posted as RFC in order to get reviews. It makes no sense to
convert all the drivers and then start changing APIs, affecting all those
converted drivers.

> 
> - be git bisectable, e. g. all patches should compile and run fine
>   after each single patch, without introducing regressions.

Compilation has already been tested (on ARM) on each patch applied in order.

> 
> That probably means that the series should be tested not only on
> omap3, but also on some other device drivers.

I fully agree with that. More review, testing and changes to at least some
drivers (mostly for removable devices) will be needed before merging them,
that's for sure.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-14 13:27           ` Sakari Ailus
@ 2016-11-22 17:44             ` Mauro Carvalho Chehab
  2016-11-22 18:13               ` Hans Verkuil
                                 ` (2 more replies)
  0 siblings, 3 replies; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-11-22 17:44 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Shuah Khan, Sakari Ailus, linux-media, hverkuil, laurent.pinchart

Em Mon, 14 Nov 2016 15:27:22 +0200
Sakari Ailus <sakari.ailus@iki.fi> escreveu:

> Hi Mauro,
> 
> I'm replying below but let me first summarise the remaining problem area
> that this patchset addresses.

Sorry for answering too late. Somehow, I missed this email in the cloud.

> The problems you and Shuah have seen and partially addressed are related to
> a larger picture which is the lifetime of (mostly) memory resources related
> to various objects used by as well both the Media controller and V4L2
> frameworks (including videobuf2) as the drivers which make use of these
> frameworks.
> 
> The Media controller and V4L2 interfaces exposed by drivers consist of
> multiple devices nodes, data structures with interdependencies within the
> frameworks themselves and dependencies from the driver's own data structures
> towards the framework data structures. The Media device and the media graph
> objects are central to the problem area as well.
> 
> So what are the issues then? Until now, we've attempted to regulate the
> users' ability to access the devices at the time they're being unregistered
> (and the associated memory released), but that approach does not really
> scale: you have to make sure that the unregistering also will not take place
> _during_ the system call --- not just in the beginning of it.
>
> The media graph contains media graph objects, some of which are media
> entities (contained in struct video_device or struct v4l2_subdev, for
> instance). Media entities as graph nodes have links to other entities. In
> order to implement the system calls, the drivers do parse this graph in
> order to obtain information they need to obtain from it. For instance, it's
> not uncommon for an implementation for video node format enumeration to
> figure out which sub-device the link from that video nodes leads to. Drivers
> may also have similar paths they follow.
> 
> Interrupt handling may also be taking place during the device removal during
> which a number of data structures are now freed. This really does call for a
> solution based on reference counting.
> 
> This leads to the conclusion that all the memory resources that could be
> accessed by the drivers or the kernel frameworks must stay intact until the
> last file handle to the said devices is closed. Otherwise, there is a
> possibility of accessing released memory.

So far, we're aligned.

> Right now in a lot of the cases, such as for video device and sub-device
> nodes, we do release the memory when a device (as in struct device) is being
> unregistered. There simply is in the current mainline kernel a way to do
> this in a safe way.

> Drivers do use devm_() family of functions to allocate
> the memory of the media graph object and their internal data structures.

Removing devm_() from those drivers seem to be the first thing to do,
and it is independent from any MC rework.

As you'll see below, we have different opinions on other matters,
so, my suggestion about how to proceed is that you should submit
first the things we're aligned.

In other words, please submit the patches that get rid of devm_()
first. Then, we can address the remaining stuff.

> 
> With this patchset:
> 
> - The media_device which again contains the media_devnode is allocated
>   dynamically. The lifetime of the media device --- and the media graph
>   objects it contains --- is bound to device nodes that are bound to the
>   media device (video and sub-device nodes) as well as open file handles.

No. Data structures with cdev embedded into them have their lifetime
controlled by the driver's core, and are destroyed only when there's
no pending fops. The current approach uses device's core dev.release()
callback to release memory.

In other words, dev.release() is only called after the driver's base
knows that the cdev is not in use anymore. So, no ioctl() or any
other syscalls on that point.

Ok, nothing prevents some driver to do the wrong thing, keeping a
copy of struct device and using it after free, for example storing
it on a devm alocated memory, and printing some debug message
after struct device is freed, but this is a driver's bug.

What really worries me on this series is that it seemed that you 
didn't understood how the current approach works. So, you decided
to just revert it and start from scratch. This is dangerous, as
it could cause problems to other scenarios than yours.

> - Care is taken that the unregistration process and releasing memory happens
>   in the right order. This was not always the case previously.

Freeing memory for struct media_devnode, struct device and struct cdev 
is currently handled by the driver's core, when it known to be safe,
and using the same logic that other subsystems do.

We might do it different, but we need a strong reason to do it, as
going away from the usual practice is dangerous.

> - The driver remains responsible for the memory of the video and sub-device
>   nodes. However, now the Media controller provides a convenient callback to
>   the driver to release any memory resources when the time has come to do
>   so. This takes place just before the media device memory is released.

Drivers could use devnode->dev.release for that. Of course, if they
override it, they should be calling media_devnode_release() on their
internal release functions.

> - Drivers that do not strictly need to be removable require no changes. The
>   benefits of this set become tangible for any driver by changing how the
>   driver allocates memory for the data structures. Ideally at least
>   drivers for hot-removable devices should be converted.

Drivers should allow device removal and/or driver removal. If you're
doing any change here, you need to touch *all* drivers to use the new 
way.

> In order to make the current drivers to behave well it is necessary to make
> changes to how memory is allocated in the drivers. If you look at the sample
> patches that are part of the set for the omap3isp driver, you'll find that
> around 95% of the changes are related to removing the user of devm_() family
> of functions instead of Media controller API changes. In this regard, the
> approach taken here requires very little if any additional overhead.

Well, send the patches that do the 95% of the changes first e. g. devm_()
removal, and check if you aren't using any dev_foo() printk after
unregister, and send such patch series, without RFC. Then test what's
still broken, if any and let's discuss with your results, in a way
that we can all reproduce the issues you may be facing on other drivers
that don't use devm*().


> On Wed, Nov 09, 2016 at 03:46:08PM -0200, Mauro Carvalho Chehab wrote:
> > Em Wed, 9 Nov 2016 10:00:58 -0700
> > Shuah Khan <shuahkh@osg.samsung.com> escreveu:
> >   
> > > > Maybe we can get the Media Device Allocator API work in and then we can
> > > > get your RFC series in after that. Here is what I propose:
> > > > 
> > > > - Keep the fixes in 4.9  
> > 
> > Fixes should always be kept. Reverting a fix is not an option.
> > Instead, do incremental patches on the top of it.
> >   
> > > > - Get Media Device Allocator API patches into 4.9.    
> > > 
> > > I meant 4.10 not 4.9
> > >   
> > > > - snd-usb-auido work go into 4.10  
> > 
> > Sounds like a plan.
> >   
> > > > Then your RFC series could go in. I am looking at the RFC series and that
> > > > the drivers need to change as well, so this RFC work could take longer.
> > > > Since we have to make media_device sharable, it is necessary to have a
> > > > global list approach Media Device Allocator API takes. So it is possible
> > > > for your RFC series to go on top of the Media Device Allocator API.  
> > 
> > Firstly, the RFC series should be converted into something that can
> > be applicable upstream, e. g.:
> > 
> > - doing the changes over the top of upstream, instead of needing to
> >   revert patches;  
> 
> The patches are in fact on top of the current media-tree, or were when they
> were sent (v4).
> 
> The reason I'm reverting patches is that the reason why these patches were
> merged was not because they would have been a sound way forward for the
> Media controller framework, but because they partially worked around issues
> in a device being in use while it was removed.
> 
> They never were a complete fix for these problems nor I do think they could
> be extended to be such. There were also unaddressed issues in these patches
> pointed out during the review. For these reasons I'm reverting the three
> patches. In more detail:
> 
> * media: fix media devnode ioctl/syscall and unregister race
>   6f0dd24a084a
> 
> The patch clears the registered bit before performing the steps related to
> unregistering a media device, but the bit is checked only at the beginning
> of the IOCTL call. As unregistering a device and an IOCTL call on a file
> handle of that device are not serialised, nothing guarantees the IOCTL call
> will finish with the registered bit still in the same state. Serialising the
> two e.g. by using a mutex is hardly a feasible solution for this.
> 
> I may have pointed out the original problem but this is not the solution.
> 
> <URL:http://www.spinics.net/lists/linux-media/msg101295.html>
> 
> The right solution is instead to make sure the data structures related to
> the media device will not disappear while the IOCTL call is in progress (at
> least).

They won't. Device core won't call dev.release() while an ioctl doesn't
finish. So, the struct device and struct devnode will exist while the
ioctl (or any other fops) is handled.

> * media: fix use-after-free in cdev_put() when app exits after driver unbind
>   5b28dde51d0c
> 
> The patch avoids the problem of deleting a character device (cdev_del())
> after its memory has been released. The change is sound as such but the
> problem is addressed by another, a lot more simple patch in my series:
> 
> <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=26fa8c1a3df5859d34cef8ef953e3a29a432a17b>

Your approach is not clean, as it is based on a cdev's hack of doing:

	devnode->cdev.kobj.parent = &devnode->dev.kobj;

That is an ugly hack, as it touches inside cdev's internal stuff,
to do something that the driver's core doesn't expect. This is the
kind of patch that could cause messy errors, by cheating with the
cdev's internal refcount checking.

Btw, your approach require changes on *all* drivers, in order to make
device release work, with is a way more complex than changing just the
core. as the current approach. 

> It might be possible to reasonably continue from here if the next patch to
> be reverted did not depend on this one.
> 
> * media-device: dynamically allocate struct media_devnode
> 
> This creates a two-way dependency between struct media_devnode and
> media_device. This is very much against the original design which clearly
> separates the two: media_devnode is entirely independent of media_device.

Those structs are still independent.

> The original intent was that another sub-system in the kernel such as the
> V4L2 could make use of media_devnode as well and while that hasn't happened,
> perhaps the two could be merged. There simply are no other reasons to keep
> the two structs separate.
> 
> The patch is certainly a workaround, as it (partially, again) works around
> issues in timing of releasing memory and accessing it.
> 
> The proper solutions regarding the media_device and media_devnode are either
> maintain the separation or unify the two, and this patch does nor suggests
> either of these. To the contrary: it makes either of these impossible by
> design, and this reason alone is enough to revert it.
> 
> The set I'm pushing maintains the separation and leaves the option of either
> merging the two (media_device and media_devnode) or making use of
> media_devnode elsewhere open.

As mentioned before, being based on a hack doesn't make it nice
for upstream merging.

The current approach uses the recommended way: the structure with
cdev embedded should be dynamically allocated. Well, we could merge
media_device and media_devnode, but, in this case, we'll need to
not embed media_device, in order to avoid hacks like the above.

> > - change all drivers as the kAPI changes;  
> 
> The patchset actually adds new APIs rather than changing the OLD one --- as
> the old one was simply that drivers were responsible for allocating the data
> structures related to a media device. Existing drivers should continue to
> work as they did before without changes.

Are you sure? Did you try the tests we did with binding/unbind, device
removal/insert and probe/remove of em28xx with your patches applied?

With that regards, you should really test it on an USB device, with
is hot-pluggable. There, you'll see a lot more memory lifetime issues
than on omap3.

> Naturally, to get full benetifs of the changes, driver changes will be also
> required (see the beginning of the message).

The test cases we did works on em28xx. If, after each patch of this series,
a regression happens, you need to address. I suspect that, even applying
the entire series, there will still be regressions, as I don't see any
changes to em28xx on this patch series.

> The set has been posted as RFC in order to get reviews. It makes no sense to
> convert all the drivers and then start changing APIs, affecting all those
> converted drivers.

Well, while it is not complete and still cause regressions, It can't be
considered ready for upstream review.

> > 
> > - be git bisectable, e. g. all patches should compile and run fine
> >   after each single patch, without introducing regressions.  
> 
> Compilation has already been tested (on ARM) on each patch applied in order.

Good, but the best is to test it also on x86. Please notice that
just compiling doesn't ensure that it doesn't introduce regressions.

You should do your best to avoid regressions on every single patch
on your patch series.

> > 
> > That probably means that the series should be tested not only on
> > omap3, but also on some other device drivers.  
> 
> I fully agree with that. More review, testing and changes to at least some
> drivers (mostly for removable devices) will be needed before merging them,
> that's for sure.

Good! One more point we agree :-)

-- 
Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-22 17:44             ` Mauro Carvalho Chehab
@ 2016-11-22 18:13               ` Hans Verkuil
  2016-11-22 18:41                 ` Shuah Khan
  2016-11-22 22:56               ` Shuah Khan
  2016-11-28 10:45               ` Sakari Ailus
  2 siblings, 1 reply; 89+ messages in thread
From: Hans Verkuil @ 2016-11-22 18:13 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Sakari Ailus
  Cc: Shuah Khan, Sakari Ailus, linux-media, laurent.pinchart

On 22/11/16 18:44, Mauro Carvalho Chehab wrote:
>> * media: fix use-after-free in cdev_put() when app exits after driver unbind
>>   5b28dde51d0c
>>
>> The patch avoids the problem of deleting a character device (cdev_del())
>> after its memory has been released. The change is sound as such but the
>> problem is addressed by another, a lot more simple patch in my series:
>>
>> <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=26fa8c1a3df5859d34cef8ef953e3a29a432a17b>
>
> Your approach is not clean, as it is based on a cdev's hack of doing:
>
> 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
>
> That is an ugly hack, as it touches inside cdev's internal stuff,
> to do something that the driver's core doesn't expect. This is the
> kind of patch that could cause messy errors, by cheating with the
> cdev's internal refcount checking.

Actually, this is what many frameworks in the kernel do:

$ git grep "kobj.parent = " drivers/
drivers/base/bus.c:     dev->kobj.parent = parent_of_root;
drivers/base/core.c:            dev->kobj.parent = kobj;
drivers/char/tpm/tpm-chip.c:    chip->cdev.kobj.parent = &chip->dev.kobj;
drivers/dax/dax.c:      cdev->kobj.parent = &dev->kobj;
drivers/gpio/gpiolib.c: gdev->chrdev.kobj.parent = &gdev->dev.kobj;
drivers/iio/industrialio-core.c:        indio_dev->chrdev.kobj.parent = 
&indio_dev->dev.kobj;
drivers/infiniband/core/user_mad.c:     port->cdev.kobj.parent = 
&umad_dev->kobj;
drivers/infiniband/core/user_mad.c:     port->sm_cdev.kobj.parent = 
&umad_dev->kobj;
drivers/infiniband/core/uverbs_main.c:  uverbs_dev->cdev.kobj.parent = 
&uverbs_dev->kobj;
drivers/infiniband/hw/hfi1/device.c:    cdev->kobj.parent = parent;
drivers/input/evdev.c:  evdev->cdev.kobj.parent = &evdev->dev.kobj;
drivers/input/joydev.c: joydev->cdev.kobj.parent = &joydev->dev.kobj;
drivers/input/mousedev.c:       mousedev->cdev.kobj.parent = 
&mousedev->dev.kobj;
drivers/media/cec/cec-core.c:   devnode->cdev.kobj.parent = 
&devnode->dev.kobj;
drivers/media/media-devnode.c:  devnode->cdev.kobj.parent = 
&devnode->dev.kobj;
drivers/platform/chrome/cros_ec_dev.c:  ec->cdev.kobj.parent = 
&ec->class_dev.kobj;
drivers/rtc/rtc-dev.c:  rtc->char_dev.kobj.parent = &rtc->dev.kobj;

And it is what Russell King told me to use in CEC as well.

fs/chardev.c currently doesn't have a function that sets cdev.kobj.parent,
even though it does use it internally to call kobject_get/put on the
parent kobject. It really expects the caller to set cdev.kobj.parent.

It ensures that when cdev_add/del is called the parent gets correctly
refcounted as well.

I plan on looking more into this patch series on Thursday or perhaps Friday.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-22 18:13               ` Hans Verkuil
@ 2016-11-22 18:41                 ` Shuah Khan
  0 siblings, 0 replies; 89+ messages in thread
From: Shuah Khan @ 2016-11-22 18:41 UTC (permalink / raw)
  To: Hans Verkuil, Mauro Carvalho Chehab, Sakari Ailus
  Cc: Sakari Ailus, linux-media, laurent.pinchart, Shuah Khan

On 11/22/2016 11:13 AM, Hans Verkuil wrote:
> On 22/11/16 18:44, Mauro Carvalho Chehab wrote:
>>> * media: fix use-after-free in cdev_put() when app exits after driver unbind
>>>   5b28dde51d0c
>>>
>>> The patch avoids the problem of deleting a character device (cdev_del())
>>> after its memory has been released. The change is sound as such but the
>>> problem is addressed by another, a lot more simple patch in my series:
>>>
>>> <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=26fa8c1a3df5859d34cef8ef953e3a29a432a17b>
>>
>> Your approach is not clean, as it is based on a cdev's hack of doing:
>>
>>     devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>
>> That is an ugly hack, as it touches inside cdev's internal stuff,
>> to do something that the driver's core doesn't expect. This is the
>> kind of patch that could cause messy errors, by cheating with the
>> cdev's internal refcount checking.
> 
> Actually, this is what many frameworks in the kernel do:
> 
> $ git grep "kobj.parent = " drivers/
> drivers/base/bus.c:     dev->kobj.parent = parent_of_root;
> drivers/base/core.c:            dev->kobj.parent = kobj;
> drivers/char/tpm/tpm-chip.c:    chip->cdev.kobj.parent = &chip->dev.kobj;
> drivers/dax/dax.c:      cdev->kobj.parent = &dev->kobj;
> drivers/gpio/gpiolib.c: gdev->chrdev.kobj.parent = &gdev->dev.kobj;
> drivers/iio/industrialio-core.c:        indio_dev->chrdev.kobj.parent = &indio_dev->dev.kobj;
> drivers/infiniband/core/user_mad.c:     port->cdev.kobj.parent = &umad_dev->kobj;
> drivers/infiniband/core/user_mad.c:     port->sm_cdev.kobj.parent = &umad_dev->kobj;
> drivers/infiniband/core/uverbs_main.c:  uverbs_dev->cdev.kobj.parent = &uverbs_dev->kobj;
> drivers/infiniband/hw/hfi1/device.c:    cdev->kobj.parent = parent;
> drivers/input/evdev.c:  evdev->cdev.kobj.parent = &evdev->dev.kobj;
> drivers/input/joydev.c: joydev->cdev.kobj.parent = &joydev->dev.kobj;
> drivers/input/mousedev.c:       mousedev->cdev.kobj.parent = &mousedev->dev.kobj;
> drivers/media/cec/cec-core.c:   devnode->cdev.kobj.parent = &devnode->dev.kobj;
> drivers/media/media-devnode.c:  devnode->cdev.kobj.parent = &devnode->dev.kobj;
> drivers/platform/chrome/cros_ec_dev.c:  ec->cdev.kobj.parent = &ec->class_dev.kobj;
> drivers/rtc/rtc-dev.c:  rtc->char_dev.kobj.parent = &rtc->dev.kobj;
> 
> And it is what Russell King told me to use in CEC as well.

Hans recommended the approach to me when I tied cdev to media_devnode
as its parent while fixing the media_device lifetime issues.

> drivers/media/media-devnode.c:  devnode->cdev.kobj.parent = &devnode->dev.kobj;

> 
> fs/chardev.c currently doesn't have a function that sets cdev.kobj.parent,
> even though it does use it internally to call kobject_get/put on the
> parent kobject. It really expects the caller to set cdev.kobj.parent.
> 
> It ensures that when cdev_add/del is called the parent gets correctly
> refcounted as well.

Yes this is what is done now to make sure cdev lifetime matches its
parent and parent doesn't get released while cdev is in use. There is
seem to some concerns about referencing cdev private object in other
parts of the kernel. However, that is something that could be solved
by adding cdev interface to set parent.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-22 17:44             ` Mauro Carvalho Chehab
  2016-11-22 18:13               ` Hans Verkuil
@ 2016-11-22 22:56               ` Shuah Khan
  2016-11-28 10:45               ` Sakari Ailus
  2 siblings, 0 replies; 89+ messages in thread
From: Shuah Khan @ 2016-11-22 22:56 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Sakari Ailus, laurent.pinchart
  Cc: Sakari Ailus, linux-media, hverkuil, Shuah Khan

On 11/22/2016 10:44 AM, Mauro Carvalho Chehab wrote:
> Em Mon, 14 Nov 2016 15:27:22 +0200
> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> 
>> Hi Mauro,
>>
>> I'm replying below but let me first summarise the remaining problem area
>> that this patchset addresses.
> 
> Sorry for answering too late. Somehow, I missed this email in the cloud.
> 
>> The problems you and Shuah have seen and partially addressed are related to
>> a larger picture which is the lifetime of (mostly) memory resources related
>> to various objects used by as well both the Media controller and V4L2
>> frameworks (including videobuf2) as the drivers which make use of these
>> frameworks.
>>
>> The Media controller and V4L2 interfaces exposed by drivers consist of
>> multiple devices nodes, data structures with interdependencies within the
>> frameworks themselves and dependencies from the driver's own data structures
>> towards the framework data structures. The Media device and the media graph
>> objects are central to the problem area as well.
>>
>> So what are the issues then? Until now, we've attempted to regulate the
>> users' ability to access the devices at the time they're being unregistered
>> (and the associated memory released), but that approach does not really
>> scale: you have to make sure that the unregistering also will not take place
>> _during_ the system call --- not just in the beginning of it.
>>
>> The media graph contains media graph objects, some of which are media
>> entities (contained in struct video_device or struct v4l2_subdev, for
>> instance). Media entities as graph nodes have links to other entities. In
>> order to implement the system calls, the drivers do parse this graph in
>> order to obtain information they need to obtain from it. For instance, it's
>> not uncommon for an implementation for video node format enumeration to
>> figure out which sub-device the link from that video nodes leads to. Drivers
>> may also have similar paths they follow.
>>
>> Interrupt handling may also be taking place during the device removal during
>> which a number of data structures are now freed. This really does call for a
>> solution based on reference counting.
>>
>> This leads to the conclusion that all the memory resources that could be
>> accessed by the drivers or the kernel frameworks must stay intact until the
>> last file handle to the said devices is closed. Otherwise, there is a
>> possibility of accessing released memory.
> 
> So far, we're aligned.
> 
>> Right now in a lot of the cases, such as for video device and sub-device
>> nodes, we do release the memory when a device (as in struct device) is being
>> unregistered. There simply is in the current mainline kernel a way to do
>> this in a safe way.
> 
>> Drivers do use devm_() family of functions to allocate
>> the memory of the media graph object and their internal data structures.
> 
> Removing devm_() from those drivers seem to be the first thing to do,
> and it is independent from any MC rework.
> 
> As you'll see below, we have different opinions on other matters,
> so, my suggestion about how to proceed is that you should submit
> first the things we're aligned.
> 
> In other words, please submit the patches that get rid of devm_()
> first. Then, we can address the remaining stuff.

I reviewed the patches that are not reverts. Especially the patches
that get rid of devm usage in omap3isp. The dmesg included in this
isn't complete and I also looked at the dmesg Lauren sent me from vsp1.

I tested unbind of au0828 while vlc is running streaming video on 4.9-rc5
unbind is successful with no Oops. vlc stops streaming and shows the updated
device list which doesn't include the video device that disappeared due to
unbind, just as expected.

I also tested it on 4.9-rc4 with Media Device Allocator API which
includes au0828 and snd_usb_audio using the API. Same result No Oops.

So I do think it is worth while testing Vsp1 and Omap3isp on 4.9-rc5 with
just the changes to not use devm. I am going to share my analysis of the
VSP1 log Lauren shared with me. I have seen a very similar log when media
device was devm with au0828 and snd_usb_audio.

The log shows vsp1_video_release() which is a fops release routine invoking
various cleanup routines until it runs into Oops accessing already released
resource. This release routine gets called when user application closes the
device file.

vsp1_video is a devm resource. This will get released very early on during
unbind from device_release()

        /*
         * Some platform devices are driven without driver attached
         * and managed resources may have been acquired.  Make sure
         * all resources are released.
         *
         * Drivers still can add resources into device after device
         * is deleted but alive, so release devres here to avoid
         * possible memory leak.
         */
        devres_release_all(dev);

So way before vsp1_video_release() is called, this resource is gone.
I think the fix for this problem is changing Vsp1 to not use devm
for a video device. I think you will see this problem even if the
driver doesn't use Media Controller API. This is a direct result of
video device being a managed resource.

It would be curious to see if you can reproduce this problem on 4.9-rc5
with driver changes to convert vsp1_video to regular resource from a
managed resource. Same comment for omap3isp. I think it is worth while
testing omap3 and vsp1 both without devm video resource and see if this
problem can be reproduced. That would tell us if this problem is a driver
specific problem tied to devm usage in the driver or a larger problem
with the media-core.

Re-ordering the patch series the following way and testing will tell us
what we really need to fix this problem you are seeing:

Make the following last patch the first one in the series:
[RFC v4 21/21] omap3isp: Don't rely on devm for memory resource management
and work from there.

Included below:


[  295.405166] Unable to handle kernel NULL pointer dereference at virtual address 000001d0
[  295.413270] pgd = ffffff80096f0000
[  295.416667] [000001d0] *pgd=0000000077ffe003, *pud=0000000077ffe003, *pmd=0000000000000000
[  295.420758]
[  295.426248] 
[  295.427738] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[  295.433308] Modules linked in:
[  295.436373] CPU: 2 PID: 901 Comm: yavta Not tainted 4.9.0-rc4+ #771
[  295.442636] Hardware name: Renesas Salvator-X board based on r8a7795 (DT)
[  295.449418] task: ffffffc0362d1500 task.stack: ffffffc035454000
[  295.455351] PC is at __lock_acquire+0x144/0x1b4c
[  295.459965] LR is at lock_acquire+0x50/0x74
[  295.464142] pc : [<ffffff8008101730>] lr : [<ffffff8008103488>] pstate: 600001c5
[  295.471530] sp : ffffffc0354578c0
[  295.474838] x29: ffffffc0354578c0 x28: ffffffc0362d1500 
[  295.480160] x27: ffffff800882a000 x26: ffffff80094b6000 
[  295.485480] x25: ffffffc03574f890 x24: ffffffc036296780 
[  295.490800] x23: ffffffc0362d1500 x22: 00000000000001d0 
[  295.496119] x21: 0000000000000000 x20: ffffffc035454000 
[  295.501439] x19: 0000000000000000 x18: 000000000000270f 
[  295.506759] x17: 0000007f78ddc9ac x16: ffffff800822b5b0 
[  295.512079] x15: ffffffc0362d1cd8 x14: ffffff8008a76000 
[  295.517398] x13: 0000000000000000 x12: ffffffc0362d1ce0 
[  295.522719] x11: 4d6961b1472a952c x10: 0000000000000000 
[  295.528038] x9 : 0000000000000001 x8 : ffffffc035454000 
[  295.533357] x7 : ffffff800845df08 x6 : 0000000000000000 
[  295.538676] x5 : 0000000000000000 x4 : 0000000000000001 
[  295.543996] x3 : 0000000000000000 x2 : 0000000000000000 
[  295.549315] x1 : 0000000000000000 x0 : ffffff8009275000 
[  295.554635] 
[  295.556122] Process yavta (pid: 901, stack limit = 0xffffffc035454020)
[  295.562645] Stack: (0xffffffc0354578c0 to 0xffffffc035458000)
[  295.568386] 78c0: ffffffc0354579d0 ffffff8008103488 0000000000000000 ffffffc035454000
[  295.576209] 78e0: 0000000000000140 ffffff80094b6000 ffffffc0362d1500 ffffffc036296780
[  295.584032] 7900: ffffffc03574f890 ffffffc035454000 0000000000000008 0000000000000000
[  295.591856] 7920: ffffffc03574f890 ffffffc035454000 0000000000000008 0000000000000000
[  295.599679] 7940: 0000000000000000 ffffffc035454000 00000000000001c0 ffffffc03630fac0
[  295.607502] 7960: ffffffc035f64020 ffffffc036296780 ffffffc0354579b0 ffffff8008101114
[  295.615325] 7980: 0000000000000001 ffffffc0362d1500 ffffffc0354579d0 ffffff8008101114
[  295.623147] 79a0: 0000000000000000 ffffffc0362d1500 0000000000000007 0000000000000006
[  295.630970] 79c0: ffffff8000000000 ffffffc000000000 ffffffc035457a00 ffffff800857ecac
[  295.638793] 79e0: ffffffc035454000 0000000000000170 ffffffc03630f018 ffffff80094b6000
[  295.646616] 7a00: ffffffc035457a80 ffffff800845df08 0000000000000170 ffffffc03630f038
[  295.654440] 7a20: ffffffc03630f018 ffffffc03630fac0 ffffffc035f64020 ffffffc036296780
[  295.662262] 7a40: ffffffc03574f890 ffffffc035454000 0000000000000008 0000000000000000
[  295.670085] 7a60: ffffffc03630f018 ffffffc03630fac0 ffffffc035f64020 0000000000000000
[  295.677908] 7a80: ffffffc035457aa0 ffffff8008480e94 ffffffc03562ec00 ffffffc03630faf8
[  295.685731] 7aa0: ffffffc035457ae0 ffffff8008477820 ffffffc03630f7d0 ffffffc03630f830
[  295.693554] 7ac0: ffffffc03630f748 ffffffc03630f7d0 ffffffc035f64020 ffffff8008481110
[  295.701377] 7ae0: ffffffc035457b20 ffffff8008478cd8 ffffffc03630f7d0 ffffffc03630f830
[  295.709200] 7b00: ffffffc03630f748 ffffffc035b74c00 ffffffc035f64020 ffffffc03630f830
[  295.717022] 7b20: ffffffc035457b40 ffffff800847a994 ffffffc03630f018 ffffffc03574f880
[  295.724845] 7b40: ffffffc035457b50 ffffff8008481154 ffffffc035457b80 ffffff800845f0b4
[  295.732668] 7b60: ffffffc03574f880 ffffffc03630f038 ffffffc036296780 ffffffc036440000
[  295.740491] 7b80: ffffffc035457bb0 ffffff80081e2d04 ffffffc03574f880 0000000000000008
[  295.748314] 7ba0: ffffffc036296780 0000000000000000 ffffffc035457c10 ffffff80081e2ea4
[  295.756137] 7bc0: ffffffc035bb0380 ffffffc0362d1ba8 0000000000000000 ffffffc0362d1500
[  295.763959] 7be0: ffffff8008863338 ffffffc035b2e1c0 ffffffc035b2e260 ffffff80080dcba4
[  295.771782] 7c00: ffffff800880e298 0000000000000000 ffffffc035457c20 ffffff80080d9744
[  295.779605] 7c20: ffffffc035457c60 ffffff80080be56c ffffffc0362d1500 ffffffc035457cc0
[  295.787428] 7c40: ffffffc035454000 0000000000000001 ffffff80087f5000 ffffff80080be568
[  295.795250] 7c60: ffffffc035457cd0 ffffff80080bec84 ffffffc035c80540 0000000000000002
[  295.803073] 7c80: ffffffc035454000 ffffffc035c80540 ffffffc036194ec0 ffffffc035457de8
[  295.810896] 7ca0: ffffff80087fa27c ffffffc035454000 0000000000000008 0000000000000000
[  295.818719] 7cc0: ffffffc0361956c8 ffffff8008581c88 ffffffc035457d00 ffffff80080ca514
[  295.826543] 7ce0: ffffffc035454000 ffffffc035457e08 ffffffc0361956c8 ffffffc035c80540
[  295.834366] 7d00: ffffffc035457d70 ffffff8008087d94 000000000041c678 ffffffc035457de8
[  295.842189] 7d20: fffffffffffffe00 000000000041c67c 0000000060000000 ffffffc035457ec0
[  295.850012] 7d40: 0000000000000123 000000000000001d ffffff8008592000 ffffffc035454000
[  295.857835] 7d60: ffffff80086d67f8 ffffff8008811d38 ffffffc035457e90 ffffff80080883dc
[  295.865658] 7d80: 0000000000000009 ffffffc035454000 ffffffc035454000 ffffffc035457ec0
[  295.873481] 7da0: 0000000060000000 0000000000000015 0000000000000123 000000000000001d
[  295.881304] 7dc0: ffffff8008592000 ffffffc035454000 ffffffc036296780 0000000000000003
[  295.889127] 7de0: 0000007fda7fab48 ffffff80081e0da8 ffffffc035457e80 ffffff80081f5238
[  295.896950] 7e00: 0000000000000000 0000000000000009 ffffffc000000000 0000000000000000
[  295.904773] 7e20: ffffffc035457e50 ffffff8008101248 ffffff8008088394 ffffffc0362d1500
[  295.912595] 7e40: 0000000000000001 ffffffc035457ec0 ffffffc035457e80 ffffff80081012c0
[  295.920419] 7e60: 0000000000000009 ffffffc035454000 ffffffc035454000 ffffff80080fe194
[  295.928241] 7e80: ffffffc035457e90 ffffff8008088394 0000000000000000 ffffff8008082ddc
[  295.936064] 7ea0: 0000000000000000 0000000029ae9c98 ffffffffffffffff 000000000041c67c
[  295.943887] 7ec0: 0000000000000003 00000000c0585611 0000007fda7fab48 0000007fda7face0
[  295.951710] 7ee0: 0000000000000001 0000000000000009 000000000000003f 0000000000000000
[  295.959533] 7f00: 000000000000001d 0000000000000004 7f7f7f7f7f7f7f7f 0101010101010101
[  295.967355] 7f20: 0000000000000018 0000000100000000 0000007fda7fa060 0000000000499000
[  295.975178] 7f40: 0000000000000000 0000000000000001 0000007fda7fa1c8 0000000000240000
[  295.983001] 7f60: 0000000029ae9c98 0000007fda7fab48 0000000000000001 0000007fda7fabe5
[  295.990824] 7f80: 0000000029ae9ce0 000000000000011e 0000000000000040 0000000000000000
[  295.998646] 7fa0: 0000000000468000 0000007fda7fa9a0 0000000000401db0 0000007fda7fa970
[  296.006469] 7fc0: 000000000041c678 0000000060000000 0000000000000003 ffffffffffffffff
[  296.014292] 7fe0: 0000000000000000 0000000000000000 ffffffffffffffff ffffffffffffffff
[  296.022116] Call trace:
[  296.024558] Exception stack(0xffffffc0354576f0 to 0xffffffc035457820)
[  296.030991] 76e0:                                   0000000000000000 0000008000000000
[  296.038814] 7700: ffffffc0354578c0 ffffff8008101730 ffffffc03682a298 0000000000000002
[  296.046637] 7720: ffffffc0354578a0 ffffff80080f373c 00000000fffff3e7 0000000000000001
[  296.054460] 7740: ffffffc03682a400 0000000000000000 ffffffc035457790 ffffff8008101114
[  296.062283] 7760: 0000000000000000 ffffffc0362d1500 0000000000000007 0000000000000006
[  296.070106] 7780: 0000000000000000 0000000000000000 ffffff8009275000 0000000000000000
[  296.077929] 77a0: 0000000000000000 0000000000000000 0000000000000001 0000000000000000
[  296.085752] 77c0: 0000000000000000 ffffff800845df08 ffffffc035454000 0000000000000001
[  296.093575] 77e0: 0000000000000000 4d6961b1472a952c ffffffc0362d1ce0 0000000000000000
[  296.101398] 7800: ffffff8008a76000 ffffffc0362d1cd8 ffffff800822b5b0 0000007f78ddc9ac
[  296.109223] [<ffffff8008101730>] __lock_acquire+0x144/0x1b4c
[  296.114876] [<ffffff8008103488>] lock_acquire+0x50/0x74
[  296.120104] [<ffffff800857ecac>] mutex_lock_nested+0x54/0x39c
[  296.125851] [<ffffff800845df08>] media_entity_pipeline_stop+0x24/0x40
[  296.132290] [<ffffff8008480e94>] vsp1_video_stop_streaming+0x8c/0x12c
[  296.138730] [<ffffff8008477820>] __vb2_queue_cancel+0x30/0x13c
[  296.144558] [<ffffff8008478cd8>] vb2_core_queue_release+0x20/0x4c
[  296.150645] [<ffffff800847a994>] vb2_queue_release+0xc/0x14
[  296.156212] [<ffffff8008481154>] vsp1_video_release+0x74/0x7c
[  296.161952] [<ffffff800845f0b4>] v4l2_release+0x3c/0x90
[  296.167176] [<ffffff80081e2d04>] __fput+0x98/0x1e0
[  296.171961] [<ffffff80081e2ea4>] ____fput+0xc/0x14
[  296.176752] [<ffffff80080d9744>] task_work_run+0xf4/0x100
[  296.182150] [<ffffff80080be56c>] do_exit+0x2f4/0x99c
[  296.187109] [<ffffff80080bec84>] do_group_exit+0x40/0x9c
[  296.192417] [<ffffff80080ca514>] get_signal+0x204/0x6d4
[  296.197640] [<ffffff8008087d94>] do_signal+0x140/0x554
[  296.202772] [<ffffff80080883dc>] do_notify_resume+0x9c/0xb0
[  296.208340] [<ffffff8008082ddc>] work_pending+0x8/0x14
[  296.213473] Code: 52800034 79004420 14000052 90008ba0 (f94002c2) 
[  296.219612] ---[ end trace b863a77bc90af9ef ]---
[  296.224235] Fixing recursive fault but reboot is needed!

thanks,
-- Shuah

> 
>>
>> With this patchset:
>>
>> - The media_device which again contains the media_devnode is allocated
>>   dynamically. The lifetime of the media device --- and the media graph
>>   objects it contains --- is bound to device nodes that are bound to the
>>   media device (video and sub-device nodes) as well as open file handles.
> 
> No. Data structures with cdev embedded into them have their lifetime
> controlled by the driver's core, and are destroyed only when there's
> no pending fops. The current approach uses device's core dev.release()
> callback to release memory.
> 
> In other words, dev.release() is only called after the driver's base
> knows that the cdev is not in use anymore. So, no ioctl() or any
> other syscalls on that point.
> 
> Ok, nothing prevents some driver to do the wrong thing, keeping a
> copy of struct device and using it after free, for example storing
> it on a devm alocated memory, and printing some debug message
> after struct device is freed, but this is a driver's bug.
> 
> What really worries me on this series is that it seemed that you 
> didn't understood how the current approach works. So, you decided
> to just revert it and start from scratch. This is dangerous, as
> it could cause problems to other scenarios than yours.
> 
>> - Care is taken that the unregistration process and releasing memory happens
>>   in the right order. This was not always the case previously.
> 
> Freeing memory for struct media_devnode, struct device and struct cdev 
> is currently handled by the driver's core, when it known to be safe,
> and using the same logic that other subsystems do.
> 
> We might do it different, but we need a strong reason to do it, as
> going away from the usual practice is dangerous.
> 
>> - The driver remains responsible for the memory of the video and sub-device
>>   nodes. However, now the Media controller provides a convenient callback to
>>   the driver to release any memory resources when the time has come to do
>>   so. This takes place just before the media device memory is released.
> 
> Drivers could use devnode->dev.release for that. Of course, if they
> override it, they should be calling media_devnode_release() on their
> internal release functions.
> 
>> - Drivers that do not strictly need to be removable require no changes. The
>>   benefits of this set become tangible for any driver by changing how the
>>   driver allocates memory for the data structures. Ideally at least
>>   drivers for hot-removable devices should be converted.
> 
> Drivers should allow device removal and/or driver removal. If you're
> doing any change here, you need to touch *all* drivers to use the new 
> way.
> 
>> In order to make the current drivers to behave well it is necessary to make
>> changes to how memory is allocated in the drivers. If you look at the sample
>> patches that are part of the set for the omap3isp driver, you'll find that
>> around 95% of the changes are related to removing the user of devm_() family
>> of functions instead of Media controller API changes. In this regard, the
>> approach taken here requires very little if any additional overhead.
> 
> Well, send the patches that do the 95% of the changes first e. g. devm_()
> removal, and check if you aren't using any dev_foo() printk after
> unregister, and send such patch series, without RFC. Then test what's
> still broken, if any and let's discuss with your results, in a way
> that we can all reproduce the issues you may be facing on other drivers
> that don't use devm*().
> 
> 
>> On Wed, Nov 09, 2016 at 03:46:08PM -0200, Mauro Carvalho Chehab wrote:
>>> Em Wed, 9 Nov 2016 10:00:58 -0700
>>> Shuah Khan <shuahkh@osg.samsung.com> escreveu:
>>>   
>>>>> Maybe we can get the Media Device Allocator API work in and then we can
>>>>> get your RFC series in after that. Here is what I propose:
>>>>>
>>>>> - Keep the fixes in 4.9  
>>>
>>> Fixes should always be kept. Reverting a fix is not an option.
>>> Instead, do incremental patches on the top of it.
>>>   
>>>>> - Get Media Device Allocator API patches into 4.9.    
>>>>
>>>> I meant 4.10 not 4.9
>>>>   
>>>>> - snd-usb-auido work go into 4.10  
>>>
>>> Sounds like a plan.
>>>   
>>>>> Then your RFC series could go in. I am looking at the RFC series and that
>>>>> the drivers need to change as well, so this RFC work could take longer.
>>>>> Since we have to make media_device sharable, it is necessary to have a
>>>>> global list approach Media Device Allocator API takes. So it is possible
>>>>> for your RFC series to go on top of the Media Device Allocator API.  
>>>
>>> Firstly, the RFC series should be converted into something that can
>>> be applicable upstream, e. g.:
>>>
>>> - doing the changes over the top of upstream, instead of needing to
>>>   revert patches;  
>>
>> The patches are in fact on top of the current media-tree, or were when they
>> were sent (v4).
>>
>> The reason I'm reverting patches is that the reason why these patches were
>> merged was not because they would have been a sound way forward for the
>> Media controller framework, but because they partially worked around issues
>> in a device being in use while it was removed.
>>
>> They never were a complete fix for these problems nor I do think they could
>> be extended to be such. There were also unaddressed issues in these patches
>> pointed out during the review. For these reasons I'm reverting the three
>> patches. In more detail:
>>
>> * media: fix media devnode ioctl/syscall and unregister race
>>   6f0dd24a084a
>>
>> The patch clears the registered bit before performing the steps related to
>> unregistering a media device, but the bit is checked only at the beginning
>> of the IOCTL call. As unregistering a device and an IOCTL call on a file
>> handle of that device are not serialised, nothing guarantees the IOCTL call
>> will finish with the registered bit still in the same state. Serialising the
>> two e.g. by using a mutex is hardly a feasible solution for this.
>>
>> I may have pointed out the original problem but this is not the solution.
>>
>> <URL:http://www.spinics.net/lists/linux-media/msg101295.html>
>>
>> The right solution is instead to make sure the data structures related to
>> the media device will not disappear while the IOCTL call is in progress (at
>> least).
> 
> They won't. Device core won't call dev.release() while an ioctl doesn't
> finish. So, the struct device and struct devnode will exist while the
> ioctl (or any other fops) is handled.
> 
>> * media: fix use-after-free in cdev_put() when app exits after driver unbind
>>   5b28dde51d0c
>>
>> The patch avoids the problem of deleting a character device (cdev_del())
>> after its memory has been released. The change is sound as such but the
>> problem is addressed by another, a lot more simple patch in my series:
>>
>> <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=26fa8c1a3df5859d34cef8ef953e3a29a432a17b>
> 
> Your approach is not clean, as it is based on a cdev's hack of doing:
> 
> 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
> 
> That is an ugly hack, as it touches inside cdev's internal stuff,
> to do something that the driver's core doesn't expect. This is the
> kind of patch that could cause messy errors, by cheating with the
> cdev's internal refcount checking.
> 
> Btw, your approach require changes on *all* drivers, in order to make
> device release work, with is a way more complex than changing just the
> core. as the current approach. 
> 
>> It might be possible to reasonably continue from here if the next patch to
>> be reverted did not depend on this one.
>>
>> * media-device: dynamically allocate struct media_devnode
>>
>> This creates a two-way dependency between struct media_devnode and
>> media_device. This is very much against the original design which clearly
>> separates the two: media_devnode is entirely independent of media_device.
> 
> Those structs are still independent.
> 
>> The original intent was that another sub-system in the kernel such as the
>> V4L2 could make use of media_devnode as well and while that hasn't happened,
>> perhaps the two could be merged. There simply are no other reasons to keep
>> the two structs separate.
>>
>> The patch is certainly a workaround, as it (partially, again) works around
>> issues in timing of releasing memory and accessing it.
>>
>> The proper solutions regarding the media_device and media_devnode are either
>> maintain the separation or unify the two, and this patch does nor suggests
>> either of these. To the contrary: it makes either of these impossible by
>> design, and this reason alone is enough to revert it.
>>
>> The set I'm pushing maintains the separation and leaves the option of either
>> merging the two (media_device and media_devnode) or making use of
>> media_devnode elsewhere open.
> 
> As mentioned before, being based on a hack doesn't make it nice
> for upstream merging.
> 
> The current approach uses the recommended way: the structure with
> cdev embedded should be dynamically allocated. Well, we could merge
> media_device and media_devnode, but, in this case, we'll need to
> not embed media_device, in order to avoid hacks like the above.
> 
>>> - change all drivers as the kAPI changes;  
>>
>> The patchset actually adds new APIs rather than changing the OLD one --- as
>> the old one was simply that drivers were responsible for allocating the data
>> structures related to a media device. Existing drivers should continue to
>> work as they did before without changes.
> 
> Are you sure? Did you try the tests we did with binding/unbind, device
> removal/insert and probe/remove of em28xx with your patches applied?
> 
> With that regards, you should really test it on an USB device, with
> is hot-pluggable. There, you'll see a lot more memory lifetime issues
> than on omap3.
> 
>> Naturally, to get full benetifs of the changes, driver changes will be also
>> required (see the beginning of the message).
> 
> The test cases we did works on em28xx. If, after each patch of this series,
> a regression happens, you need to address. I suspect that, even applying
> the entire series, there will still be regressions, as I don't see any
> changes to em28xx on this patch series.
> 
>> The set has been posted as RFC in order to get reviews. It makes no sense to
>> convert all the drivers and then start changing APIs, affecting all those
>> converted drivers.
> 
> Well, while it is not complete and still cause regressions, It can't be
> considered ready for upstream review.
> 
>>>
>>> - be git bisectable, e. g. all patches should compile and run fine
>>>   after each single patch, without introducing regressions.  
>>
>> Compilation has already been tested (on ARM) on each patch applied in order.
> 
> Good, but the best is to test it also on x86. Please notice that
> just compiling doesn't ensure that it doesn't introduce regressions.
> 
> You should do your best to avoid regressions on every single patch
> on your patch series.
> 
>>>
>>> That probably means that the series should be tested not only on
>>> omap3, but also on some other device drivers.  
>>
>> I fully agree with that. More review, testing and changes to at least some
>> drivers (mostly for removable devices) will be needed before merging them,
>> that's for sure.
> 
> Good! One more point we agree :-)
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-22 17:44             ` Mauro Carvalho Chehab
  2016-11-22 18:13               ` Hans Verkuil
  2016-11-22 22:56               ` Shuah Khan
@ 2016-11-28 10:45               ` Sakari Ailus
  2016-11-29 11:13                 ` Mauro Carvalho Chehab
  2 siblings, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2016-11-28 10:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shuah Khan, Sakari Ailus, linux-media, hverkuil, laurent.pinchart

Hi Mauro,

On Tue, Nov 22, 2016 at 03:44:29PM -0200, Mauro Carvalho Chehab wrote:
> Em Mon, 14 Nov 2016 15:27:22 +0200
> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> 
> > Hi Mauro,
> > 
> > I'm replying below but let me first summarise the remaining problem area
> > that this patchset addresses.
> 
> Sorry for answering too late. Somehow, I missed this email in the cloud.
> 
> > The problems you and Shuah have seen and partially addressed are related to
> > a larger picture which is the lifetime of (mostly) memory resources related
> > to various objects used by as well both the Media controller and V4L2
> > frameworks (including videobuf2) as the drivers which make use of these
> > frameworks.
> > 
> > The Media controller and V4L2 interfaces exposed by drivers consist of
> > multiple devices nodes, data structures with interdependencies within the
> > frameworks themselves and dependencies from the driver's own data structures
> > towards the framework data structures. The Media device and the media graph
> > objects are central to the problem area as well.
> > 
> > So what are the issues then? Until now, we've attempted to regulate the
> > users' ability to access the devices at the time they're being unregistered
> > (and the associated memory released), but that approach does not really
> > scale: you have to make sure that the unregistering also will not take place
> > _during_ the system call --- not just in the beginning of it.
> >
> > The media graph contains media graph objects, some of which are media
> > entities (contained in struct video_device or struct v4l2_subdev, for
> > instance). Media entities as graph nodes have links to other entities. In
> > order to implement the system calls, the drivers do parse this graph in
> > order to obtain information they need to obtain from it. For instance, it's
> > not uncommon for an implementation for video node format enumeration to
> > figure out which sub-device the link from that video nodes leads to. Drivers
> > may also have similar paths they follow.
> > 
> > Interrupt handling may also be taking place during the device removal during
> > which a number of data structures are now freed. This really does call for a
> > solution based on reference counting.
> > 
> > This leads to the conclusion that all the memory resources that could be
> > accessed by the drivers or the kernel frameworks must stay intact until the
> > last file handle to the said devices is closed. Otherwise, there is a
> > possibility of accessing released memory.
> 
> So far, we're aligned.
> 
> > Right now in a lot of the cases, such as for video device and sub-device
> > nodes, we do release the memory when a device (as in struct device) is being
> > unregistered. There simply is in the current mainline kernel a way to do
> > this in a safe way.
> 
> > Drivers do use devm_() family of functions to allocate
> > the memory of the media graph object and their internal data structures.
> 
> Removing devm_() from those drivers seem to be the first thing to do,
> and it is independent from any MC rework.
> 
> As you'll see below, we have different opinions on other matters,
> so, my suggestion about how to proceed is that you should submit
> first the things we're aligned.
> 
> In other words, please submit the patches that get rid of devm_()
> first. Then, we can address the remaining stuff.

Removing devm_*() is needed, but when should the memory be released then?
There's no callback currently from the media device the driver could use.

OTOH devm_*() interfaces are very convenient to use, it's a lot of extra
work for drivers to handle releasing all the resources. It'd be great to
find another object where to bind those resources. Still, device_release()
does first release devres resources and then calls the release() callback,
which obviously makes the setup problematic to begin with.

> 
> > 
> > With this patchset:
> > 
> > - The media_device which again contains the media_devnode is allocated
> >   dynamically. The lifetime of the media device --- and the media graph
> >   objects it contains --- is bound to device nodes that are bound to the
> >   media device (video and sub-device nodes) as well as open file handles.
> 
> No. Data structures with cdev embedded into them have their lifetime
> controlled by the driver's core, and are destroyed only when there's
> no pending fops. The current approach uses device's core dev.release()

Fair enough; that part is indeed handled towards the user space as far as I
can tell. However that's still not enough: the media graph contains the
graph objects, and the media device that holds the graph, must outlive the
graph objects themselves.

Also removing entities doesn't really work currently: touching an entity, a
link or any kind of a graph object is not guaranteed to work unless you hold
the media graph lock. And that's simply unfeasible. Just look at what the
drivers do with entities: they use the v4l2_subdev interface and the control
framework to access them.

These data structures contain struct media_entity in them, and that entity
is part of the media graph. Other drivers use entities e.g. to obtain
control values from them. References should be used to prevent releasing the
memory.

media_entity_get() and media_entity_put() do not do what you'd expect.

v4l2_subdev_call() should also verify that a sub-device is registered, and
make sure it will stay that way for the duration of call: the driver must be
able to expect the entity is accessible as the driver registered it.

The same goes for the control framework.

As far as I remember, we somehow assumed that just acquiring the related
kernel modules would be enough to counter this but it is not.

I would prefer to postpone this however, the patchset already does enough
for a single patchset. Fixing this properly would likely require wait/wound
mutexes for individual entities.

> callback to release memory.
> 
> In other words, dev.release() is only called after the driver's base
> knows that the cdev is not in use anymore. So, no ioctl() or any
> other syscalls on that point.
> 
> Ok, nothing prevents some driver to do the wrong thing, keeping a
> copy of struct device and using it after free, for example storing
> it on a devm alocated memory, and printing some debug message
> after struct device is freed, but this is a driver's bug.
> 
> What really worries me on this series is that it seemed that you 
> didn't understood how the current approach works. So, you decided
> to just revert it and start from scratch. This is dangerous, as
> it could cause problems to other scenarios than yours.

I'm not quite sure what do you mean.

It may well be that the patchset will require changes but that's precisely
the reason why patches are reviewed before merging.

> 
> > - Care is taken that the unregistration process and releasing memory happens
> >   in the right order. This was not always the case previously.
> 
> Freeing memory for struct media_devnode, struct device and struct cdev 
> is currently handled by the driver's core, when it known to be safe,
> and using the same logic that other subsystems do.

That's simply not the case. Other sub-systems do not have graphs managed by
multiple device drivers for multiple physical devices that expose device
nodes through which all of those devices can be accessed. The problem domain
is far more complex than if you had a single physical device for which a
driver would expose a device node or two to the user space.

> 
> We might do it different, but we need a strong reason to do it, as
> going away from the usual practice is dangerous.

I think we already did that when we merged the original Media controller and
V4L2 sub-device patches...

> 
> > - The driver remains responsible for the memory of the video and sub-device
> >   nodes. However, now the Media controller provides a convenient callback to
> >   the driver to release any memory resources when the time has come to do
> >   so. This takes place just before the media device memory is released.
> 
> Drivers could use devnode->dev.release for that. Of course, if they
> override it, they should be calling media_devnode_release() on their
> internal release functions.

That'd be really hackish. The drivers currently don't deal with
media_devnode directly now and I don't think they should be obliged to.

> 
> > - Drivers that do not strictly need to be removable require no changes. The
> >   benefits of this set become tangible for any driver by changing how the
> >   driver allocates memory for the data structures. Ideally at least
> >   drivers for hot-removable devices should be converted.
> 
> Drivers should allow device removal and/or driver removal. If you're
> doing any change here, you need to touch *all* drivers to use the new 
> way.

Let's first agree on what needs to be fixed and how, and then think about
converting the drivers. Buggy code has a tendency to continue to be buggy
unless it is fixed (or replaced).

> 
> > In order to make the current drivers to behave well it is necessary to make
> > changes to how memory is allocated in the drivers. If you look at the sample
> > patches that are part of the set for the omap3isp driver, you'll find that
> > around 95% of the changes are related to removing the user of devm_() family
> > of functions instead of Media controller API changes. In this regard, the
> > approach taken here requires very little if any additional overhead.
> 
> Well, send the patches that do the 95% of the changes first e. g. devm_()
> removal, and check if you aren't using any dev_foo() printk after
> unregister, and send such patch series, without RFC. Then test what's
> still broken, if any and let's discuss with your results, in a way
> that we can all reproduce the issues you may be facing on other drivers
> that don't use devm*().

As I said, there's currently no way to properly release these resources as
the driver won't receive a callback from media device release.

> 
> 
> > On Wed, Nov 09, 2016 at 03:46:08PM -0200, Mauro Carvalho Chehab wrote:
> > > Em Wed, 9 Nov 2016 10:00:58 -0700
> > > Shuah Khan <shuahkh@osg.samsung.com> escreveu:
> > >   
> > > > > Maybe we can get the Media Device Allocator API work in and then we can
> > > > > get your RFC series in after that. Here is what I propose:
> > > > > 
> > > > > - Keep the fixes in 4.9  
> > > 
> > > Fixes should always be kept. Reverting a fix is not an option.
> > > Instead, do incremental patches on the top of it.
> > >   
> > > > > - Get Media Device Allocator API patches into 4.9.    
> > > > 
> > > > I meant 4.10 not 4.9
> > > >   
> > > > > - snd-usb-auido work go into 4.10  
> > > 
> > > Sounds like a plan.
> > >   
> > > > > Then your RFC series could go in. I am looking at the RFC series and that
> > > > > the drivers need to change as well, so this RFC work could take longer.
> > > > > Since we have to make media_device sharable, it is necessary to have a
> > > > > global list approach Media Device Allocator API takes. So it is possible
> > > > > for your RFC series to go on top of the Media Device Allocator API.  
> > > 
> > > Firstly, the RFC series should be converted into something that can
> > > be applicable upstream, e. g.:
> > > 
> > > - doing the changes over the top of upstream, instead of needing to
> > >   revert patches;  
> > 
> > The patches are in fact on top of the current media-tree, or were when they
> > were sent (v4).
> > 
> > The reason I'm reverting patches is that the reason why these patches were
> > merged was not because they would have been a sound way forward for the
> > Media controller framework, but because they partially worked around issues
> > in a device being in use while it was removed.
> > 
> > They never were a complete fix for these problems nor I do think they could
> > be extended to be such. There were also unaddressed issues in these patches
> > pointed out during the review. For these reasons I'm reverting the three
> > patches. In more detail:
> > 
> > * media: fix media devnode ioctl/syscall and unregister race
> >   6f0dd24a084a
> > 
> > The patch clears the registered bit before performing the steps related to
> > unregistering a media device, but the bit is checked only at the beginning
> > of the IOCTL call. As unregistering a device and an IOCTL call on a file
> > handle of that device are not serialised, nothing guarantees the IOCTL call
> > will finish with the registered bit still in the same state. Serialising the
> > two e.g. by using a mutex is hardly a feasible solution for this.
> > 
> > I may have pointed out the original problem but this is not the solution.
> > 
> > <URL:http://www.spinics.net/lists/linux-media/msg101295.html>
> > 
> > The right solution is instead to make sure the data structures related to
> > the media device will not disappear while the IOCTL call is in progress (at
> > least).
> 
> They won't. Device core won't call dev.release() while an ioctl doesn't
> finish. So, the struct device and struct devnode will exist while the
> ioctl (or any other fops) is handled.

I believe you're right when it comes to drivers using video devices without
Media controller. However the Media devices and V4L2 sub-device nodes are
another matter as well as the drivers The drivers need to be able to rely on
the frameworks to support them. On MC the driver simply has no way to
release the media device at the right time. The same applies to V4L2
sub-devices --- something that could be added to the patchset.

> 
> > * media: fix use-after-free in cdev_put() when app exits after driver unbind
> >   5b28dde51d0c
> > 
> > The patch avoids the problem of deleting a character device (cdev_del())
> > after its memory has been released. The change is sound as such but the
> > problem is addressed by another, a lot more simple patch in my series:
> > 
> > <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=26fa8c1a3df5859d34cef8ef953e3a29a432a17b>
> 
> Your approach is not clean, as it is based on a cdev's hack of doing:
> 
> 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
> 
> That is an ugly hack, as it touches inside cdev's internal stuff,
> to do something that the driver's core doesn't expect. This is the
> kind of patch that could cause messy errors, by cheating with the
> cdev's internal refcount checking.
> 
> Btw, your approach require changes on *all* drivers, in order to make
> device release work, with is a way more complex than changing just the
> core. as the current approach. 
> 
> > It might be possible to reasonably continue from here if the next patch to
> > be reverted did not depend on this one.
> > 
> > * media-device: dynamically allocate struct media_devnode
> > 
> > This creates a two-way dependency between struct media_devnode and
> > media_device. This is very much against the original design which clearly
> > separates the two: media_devnode is entirely independent of media_device.
> 
> Those structs are still independent.
> 
> > The original intent was that another sub-system in the kernel such as the
> > V4L2 could make use of media_devnode as well and while that hasn't happened,
> > perhaps the two could be merged. There simply are no other reasons to keep
> > the two structs separate.
> > 
> > The patch is certainly a workaround, as it (partially, again) works around
> > issues in timing of releasing memory and accessing it.
> > 
> > The proper solutions regarding the media_device and media_devnode are either
> > maintain the separation or unify the two, and this patch does nor suggests
> > either of these. To the contrary: it makes either of these impossible by
> > design, and this reason alone is enough to revert it.
> > 
> > The set I'm pushing maintains the separation and leaves the option of either
> > merging the two (media_device and media_devnode) or making use of
> > media_devnode elsewhere open.
> 
> As mentioned before, being based on a hack doesn't make it nice
> for upstream merging.
> 
> The current approach uses the recommended way: the structure with
> cdev embedded should be dynamically allocated. Well, we could merge
> media_device and media_devnode, but, in this case, we'll need to
> not embed media_device, in order to avoid hacks like the above.

The current approach is simply not enough, be cdev allocated separately from
media_devnode or not: the drivers have no way properly release memory
related to the media devices nor the v4l2 sub-devices. That memory will get
accessed through IOCTL calls: simply checking that a device was registered
at one point does not mean it continues to be registered in another point of
time in the future, unless the two operations are serialised in a way or
another.

> 
> > > - change all drivers as the kAPI changes;  
> > 
> > The patchset actually adds new APIs rather than changing the OLD one --- as
> > the old one was simply that drivers were responsible for allocating the data
> > structures related to a media device. Existing drivers should continue to
> > work as they did before without changes.
> 
> Are you sure? Did you try the tests we did with binding/unbind, device
> removal/insert and probe/remove of em28xx with your patches applied?

I haven't tested that but as a matter of fact, I think I indeed have such
device so I could test it. Changes on the DVB side would be needed as well
in order to benefit from the API for allocating the media device.

> 
> With that regards, you should really test it on an USB device, with
> is hot-pluggable. There, you'll see a lot more memory lifetime issues
> than on omap3.

I'm not so sure about USB devices: unbinding works the same way whether the
device is actually hot-pluggable. Still testing with different kind of
devices definitely does help to root out issues, that's for sure.

> 
> > Naturally, to get full benetifs of the changes, driver changes will be also
> > required (see the beginning of the message).
> 
> The test cases we did works on em28xx. If, after each patch of this series,
> a regression happens, you need to address. I suspect that, even applying
> the entire series, there will still be regressions, as I don't see any
> changes to em28xx on this patch series.

That's true, I've only changed the omap3isp driver so far as I wanted to get
feedback on the framework changes.

> 
> > The set has been posted as RFC in order to get reviews. It makes no sense to
> > convert all the drivers and then start changing APIs, affecting all those
> > converted drivers.
> 
> Well, while it is not complete and still cause regressions, It can't be
> considered ready for upstream review.
> 
> > > 
> > > - be git bisectable, e. g. all patches should compile and run fine
> > >   after each single patch, without introducing regressions.  
> > 
> > Compilation has already been tested (on ARM) on each patch applied in order.
> 
> Good, but the best is to test it also on x86. Please notice that
> just compiling doesn't ensure that it doesn't introduce regressions.
> 
> You should do your best to avoid regressions on every single patch
> on your patch series.

Certainly. Other than that, there would be fewer patches than there is
now...

> 
> > > 
> > > That probably means that the series should be tested not only on
> > > omap3, but also on some other device drivers.  
> > 
> > I fully agree with that. More review, testing and changes to at least some
> > drivers (mostly for removable devices) will be needed before merging them,
> > that's for sure.
> 
> Good! One more point we agree :-)

That's progress. It's a good start but we need more than that.

-- 
Regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-28 10:45               ` Sakari Ailus
@ 2016-11-29 11:13                 ` Mauro Carvalho Chehab
  2016-12-13 10:53                   ` Sakari Ailus
  0 siblings, 1 reply; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-11-29 11:13 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Shuah Khan, Sakari Ailus, linux-media, hverkuil, laurent.pinchart

Hi Sakari,

I answered you point to point below, but I suspect that you missed how the 
current approach works. So, I decided to write a quick summary here.

The character devices /dev/media? are created via cdev, with relies on a 
kobject per device, with has an embedded struct kref inside.

Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
struct device, when the code does:
	devnode->cdev.kobj.parent = &devnode->dev.kobj;

before calling cdev_add().

The current lifetime management is actually based on cdev's kobject's
refcount, provided by its embedded kref.

The kref warrants that any data associated with /dev/media0 won't be 
freed if there are any pending system call. In other words, when 
cdev_del() is called, it will remove /dev/media0 from the filesystem, and
will call kobject_put(). 

If the refcount is zero, it will call devnode->dev.release(). If the 
kobject refcount is not zero, the data won't be freed.

So, in the best case scenario, there's no opened file descriptors
by the time media device node is unregistered. So, it will free
everything.

In the worse case scenario, e. g. when the driver is removed or 
unbind while /dev/media0 has some opened file descriptor(s),
the cdev logic will do the proper lifetime management.

On such case, /dev/media0 disappears from the file system, so another open
is not possible anymore. The data structures will remain allocated until
all associated file descriptors are not closed.

When all file descriptors are closed, the data will be freed.

On that time, it will call an optional dev.release() callback,
responsible to free any other data struct that the driver allocated.

Em Mon, 28 Nov 2016 12:45:56 +0200
Sakari Ailus <sakari.ailus@iki.fi> escreveu:

> Hi Mauro,
> 
> On Tue, Nov 22, 2016 at 03:44:29PM -0200, Mauro Carvalho Chehab wrote:
> > Em Mon, 14 Nov 2016 15:27:22 +0200
> > Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> >   
> > > Hi Mauro,
> > > 
> > > I'm replying below but let me first summarise the remaining problem area
> > > that this patchset addresses.  
> > 
> > Sorry for answering too late. Somehow, I missed this email in the cloud.
> >   
> > > The problems you and Shuah have seen and partially addressed are related to
> > > a larger picture which is the lifetime of (mostly) memory resources related
> > > to various objects used by as well both the Media controller and V4L2
> > > frameworks (including videobuf2) as the drivers which make use of these
> > > frameworks.
> > > 
> > > The Media controller and V4L2 interfaces exposed by drivers consist of
> > > multiple devices nodes, data structures with interdependencies within the
> > > frameworks themselves and dependencies from the driver's own data structures
> > > towards the framework data structures. The Media device and the media graph
> > > objects are central to the problem area as well.
> > > 
> > > So what are the issues then? Until now, we've attempted to regulate the
> > > users' ability to access the devices at the time they're being unregistered
> > > (and the associated memory released), but that approach does not really
> > > scale: you have to make sure that the unregistering also will not take place
> > > _during_ the system call --- not just in the beginning of it.
> > >
> > > The media graph contains media graph objects, some of which are media
> > > entities (contained in struct video_device or struct v4l2_subdev, for
> > > instance). Media entities as graph nodes have links to other entities. In
> > > order to implement the system calls, the drivers do parse this graph in
> > > order to obtain information they need to obtain from it. For instance, it's
> > > not uncommon for an implementation for video node format enumeration to
> > > figure out which sub-device the link from that video nodes leads to. Drivers
> > > may also have similar paths they follow.
> > > 
> > > Interrupt handling may also be taking place during the device removal during
> > > which a number of data structures are now freed. This really does call for a
> > > solution based on reference counting.
> > > 
> > > This leads to the conclusion that all the memory resources that could be
> > > accessed by the drivers or the kernel frameworks must stay intact until the
> > > last file handle to the said devices is closed. Otherwise, there is a
> > > possibility of accessing released memory.  
> > 
> > So far, we're aligned.
> >   
> > > Right now in a lot of the cases, such as for video device and sub-device
> > > nodes, we do release the memory when a device (as in struct device) is being
> > > unregistered. There simply is in the current mainline kernel a way to do
> > > this in a safe way.  
> >   
> > > Drivers do use devm_() family of functions to allocate
> > > the memory of the media graph object and their internal data structures.  
> > 
> > Removing devm_() from those drivers seem to be the first thing to do,
> > and it is independent from any MC rework.
> > 
> > As you'll see below, we have different opinions on other matters,
> > so, my suggestion about how to proceed is that you should submit
> > first the things we're aligned.
> > 
> > In other words, please submit the patches that get rid of devm_()
> > first. Then, we can address the remaining stuff.  
> 
> Removing devm_*() is needed, but when should the memory be released then?
> There's no callback currently from the media device the driver could use.

It should be easy to add a release callback if you need. Yet, I think you
don't need a callback for that. Instead, you could just use the already
existing one at struct device, e. g. export media_devnode_release() and,
on drivers that need to release additional data, you would be doing something
like:

	static void my_devnode_release(struct device *cd)
	{
		// Some code that would release things before kfree(dev)
		kthread_stop(foo_thread);
		free(foo);

		// will internally do a kfree(dev)
		media_devnode_release(cd);

		// Some code that would release things after kfree(dev)
		free(bar);
	}

And set the new release callback after registering the media device with:

	media_device_register(...);
	devnode->dev.release = my_devnode_release;

The advantage of such approach is that it allows to control the order
where things will be freed/released.

> 
> OTOH devm_*() interfaces are very convenient to use, it's a lot of extra
> work for drivers to handle releasing all the resources. It'd be great to
> find another object where to bind those resources. Still, device_release()
> does first release devres resources and then calls the release() callback,
> which obviously makes the setup problematic to begin with.

Shuah's approach is providing another way to bind things. Yet, maybe
it could still be possible to use devm_*(), if it has a way to
control when devm will free their resources. I suspect that, if you call
devm_free() during dev.release() callback, or if you use the same struct
device that is associated with the cdev, devm will work.

> >   
> > > 
> > > With this patchset:
> > > 
> > > - The media_device which again contains the media_devnode is allocated
> > >   dynamically. The lifetime of the media device --- and the media graph
> > >   objects it contains --- is bound to device nodes that are bound to the
> > >   media device (video and sub-device nodes) as well as open file handles.  
> > 
> > No. Data structures with cdev embedded into them have their lifetime
> > controlled by the driver's core, and are destroyed only when there's
> > no pending fops. The current approach uses device's core dev.release()  
> 
> Fair enough; that part is indeed handled towards the user space as far as I
> can tell. However that's still not enough: the media graph contains the
> graph objects, and the media device that holds the graph, must outlive the
> graph objects themselves.

Sorry, didn't follow you here. What's the sense of not freeing the media
graph before destroying the struct device associated with /dev/media0? 
In other words, what should outlive after chardev's data is freed?

Please notice that the driver's core kobject kref ensures that the device
release code is called only after all file descriptors are closed, and
no other syscall would affect the cdev.

> Also removing entities doesn't really work currently: touching an entity, a
> link or any kind of a graph object is not guaranteed to work unless you hold
> the media graph lock. And that's simply unfeasible.

Sorry, again, didn't follow you here. The current strategy for adding
and removing things at the graph relies on a lock, with serializes access
to the graph, in order to avoid races if someone is trying to navigate on
the graph while an object is being inserted or removed.

It could be converted into a lockless approach (for example, using RCU),
but this is a separate issue.

The removal code needs to use whatever lock (or lockless) schema we
use to serialize the access to the graph.

> Just look at what the
> drivers do with entities: they use the v4l2_subdev interface and the control
> framework to access them.
> 
> These data structures contain struct media_entity in them, and that entity
> is part of the media graph. Other drivers use entities e.g. to obtain
> control values from them. References should be used to prevent releasing the
> memory.

References are used by the driver's core, using kobject_get() and
kobject_put(). That warrants that dev.release() will only be called
when nobody is using it anymore.

> media_entity_get() and media_entity_put() do not do what you'd expect.

Please elaborate.

> v4l2_subdev_call() should also verify that a sub-device is registered, and
> make sure it will stay that way for the duration of call: the driver must be
> able to expect the entity is accessible as the driver registered it.

Yes, but I can't see how this is related to this discussion. Before
unregistering struct device, you need to unbind the subdevs.

The only case I can see of calling v4l2_subdev_call() after all file
descriptors are closed is if you have some kthread running. You need to 
call kthread_stop() for such kthreads before freeing struct device.

You could do it at a my_devnode_release() if you need the kthread running
even after closing all file descriptors, or even before that, before
calling media_device_unregister().

> The same goes for the control framework.

I don't think we have kthreads for controls. The control routines
are called only when a file descriptor is opened. So, I don't see
any possible issue with the control framework.

> As far as I remember, we somehow assumed that just acquiring the related
> kernel modules would be enough to counter this but it is not.

Well, if not, you could use kobject_get() and kobject_put() to increment
or decrement the cdev's refcount. Yet, I suspect that, if the drivers are 
properly designed, you won't need to manually touch at the kref.

> 
> I would prefer to postpone this however, the patchset already does enough
> for a single patchset. Fixing this properly would likely require wait/wound
> mutexes for individual entities.
> 
> > callback to release memory.
> > 
> > In other words, dev.release() is only called after the driver's base
> > knows that the cdev is not in use anymore. So, no ioctl() or any
> > other syscalls on that point.
> > 
> > Ok, nothing prevents some driver to do the wrong thing, keeping a
> > copy of struct device and using it after free, for example storing
> > it on a devm alocated memory, and printing some debug message
> > after struct device is freed, but this is a driver's bug.
> > 
> > What really worries me on this series is that it seemed that you 
> > didn't understood how the current approach works. So, you decided
> > to just revert it and start from scratch. This is dangerous, as
> > it could cause problems to other scenarios than yours.  
> 
> I'm not quite sure what do you mean.
> 
> It may well be that the patchset will require changes but that's precisely
> the reason why patches are reviewed before merging.

>From your comments and from your code, you didn't seem to realize that
the current approach relies at the struct device refcount. See above.

> >   
> > > - Care is taken that the unregistration process and releasing memory happens
> > >   in the right order. This was not always the case previously.  
> > 
> > Freeing memory for struct media_devnode, struct device and struct cdev 
> > is currently handled by the driver's core, when it known to be safe,
> > and using the same logic that other subsystems do.  
> 
> That's simply not the case. Other sub-systems do not have graphs managed by
> multiple device drivers for multiple physical devices that expose device
> nodes through which all of those devices can be accessed. The problem domain
> is far more complex than if you had a single physical device for which a
> driver would expose a device node or two to the user space.

No. The current approach uses the struct device associated with /dev/media0,
created via cdev, to provide a refcount for the data associated with the
character device.

The struct device kobject refcount ensures that everything associated
with it will only be freed after the refcount goes to zero.

As I said before, if are there any cases where the refcount is going
early to zero, it is just a matter of adding a few kobject_get() and
kobject_put() to ensure that this won't happen early, if the driver is
so broken that it is unable to do the right refcount.

> 
> > 
> > We might do it different, but we need a strong reason to do it, as
> > going away from the usual practice is dangerous.  
> 
> I think we already did that when we merged the original Media controller and
> V4L2 sub-device patches...
> 
> >   
> > > - The driver remains responsible for the memory of the video and sub-device
> > >   nodes. However, now the Media controller provides a convenient callback to
> > >   the driver to release any memory resources when the time has come to do
> > >   so. This takes place just before the media device memory is released.  
> > 
> > Drivers could use devnode->dev.release for that. Of course, if they
> > override it, they should be calling media_devnode_release() on their
> > internal release functions.  
> 
> That'd be really hackish. The drivers currently don't deal with
> media_devnode directly now and I don't think they should be obliged to.

I'm not against adding a callback instead. However, that makes it lose
flexibility, as the callback will either be called before of after
freeing struct device.

By overriding the dev.release callback, we have a finer control.

If you don't see any case where we'll be freeing data after freeing
struct device, then a callback would work.

> >   
> > > - Drivers that do not strictly need to be removable require no changes. The
> > >   benefits of this set become tangible for any driver by changing how the
> > >   driver allocates memory for the data structures. Ideally at least
> > >   drivers for hot-removable devices should be converted.  
> > 
> > Drivers should allow device removal and/or driver removal. If you're
> > doing any change here, you need to touch *all* drivers to use the new 
> > way.  
> 
> Let's first agree on what needs to be fixed and how, and then think about
> converting the drivers. Buggy code has a tendency to continue to be buggy
> unless it is fixed (or replaced).

True, but as I said, this series create buggy code when it ignored what
was fixed already. Also, a patch series to be considered ready for
upstream need to do the needed changes on all drivers it affects.

> > > In order to make the current drivers to behave well it is necessary to make
> > > changes to how memory is allocated in the drivers. If you look at the sample
> > > patches that are part of the set for the omap3isp driver, you'll find that
> > > around 95% of the changes are related to removing the user of devm_() family
> > > of functions instead of Media controller API changes. In this regard, the
> > > approach taken here requires very little if any additional overhead.  
> > 
> > Well, send the patches that do the 95% of the changes first e. g. devm_()
> > removal, and check if you aren't using any dev_foo() printk after
> > unregister, and send such patch series, without RFC. Then test what's
> > still broken, if any and let's discuss with your results, in a way
> > that we can all reproduce the issues you may be facing on other drivers
> > that don't use devm*().  
> 
> As I said, there's currently no way to properly release these resources as
> the driver won't receive a callback from media device release.

If you're so convinced that it is needed and you won't be overriding
media device's struct device release callback, just add it. It should
be a 3 lines patch.
> 
> > 
> >   
> > > On Wed, Nov 09, 2016 at 03:46:08PM -0200, Mauro Carvalho Chehab wrote:  
> > > > Em Wed, 9 Nov 2016 10:00:58 -0700
> > > > Shuah Khan <shuahkh@osg.samsung.com> escreveu:
> > > >     
> > > > > > Maybe we can get the Media Device Allocator API work in and then we can
> > > > > > get your RFC series in after that. Here is what I propose:
> > > > > > 
> > > > > > - Keep the fixes in 4.9    
> > > > 
> > > > Fixes should always be kept. Reverting a fix is not an option.
> > > > Instead, do incremental patches on the top of it.
> > > >     
> > > > > > - Get Media Device Allocator API patches into 4.9.      
> > > > > 
> > > > > I meant 4.10 not 4.9
> > > > >     
> > > > > > - snd-usb-auido work go into 4.10    
> > > > 
> > > > Sounds like a plan.
> > > >     
> > > > > > Then your RFC series could go in. I am looking at the RFC series and that
> > > > > > the drivers need to change as well, so this RFC work could take longer.
> > > > > > Since we have to make media_device sharable, it is necessary to have a
> > > > > > global list approach Media Device Allocator API takes. So it is possible
> > > > > > for your RFC series to go on top of the Media Device Allocator API.    
> > > > 
> > > > Firstly, the RFC series should be converted into something that can
> > > > be applicable upstream, e. g.:
> > > > 
> > > > - doing the changes over the top of upstream, instead of needing to
> > > >   revert patches;    
> > > 
> > > The patches are in fact on top of the current media-tree, or were when they
> > > were sent (v4).
> > > 
> > > The reason I'm reverting patches is that the reason why these patches were
> > > merged was not because they would have been a sound way forward for the
> > > Media controller framework, but because they partially worked around issues
> > > in a device being in use while it was removed.
> > > 
> > > They never were a complete fix for these problems nor I do think they could
> > > be extended to be such. There were also unaddressed issues in these patches
> > > pointed out during the review. For these reasons I'm reverting the three
> > > patches. In more detail:
> > > 
> > > * media: fix media devnode ioctl/syscall and unregister race
> > >   6f0dd24a084a
> > > 
> > > The patch clears the registered bit before performing the steps related to
> > > unregistering a media device, but the bit is checked only at the beginning
> > > of the IOCTL call. As unregistering a device and an IOCTL call on a file
> > > handle of that device are not serialised, nothing guarantees the IOCTL call
> > > will finish with the registered bit still in the same state. Serialising the
> > > two e.g. by using a mutex is hardly a feasible solution for this.
> > > 
> > > I may have pointed out the original problem but this is not the solution.
> > > 
> > > <URL:http://www.spinics.net/lists/linux-media/msg101295.html>
> > > 
> > > The right solution is instead to make sure the data structures related to
> > > the media device will not disappear while the IOCTL call is in progress (at
> > > least).  
> > 
> > They won't. Device core won't call dev.release() while an ioctl doesn't
> > finish. So, the struct device and struct devnode will exist while the
> > ioctl (or any other fops) is handled.  
> 
> I believe you're right when it comes to drivers using video devices without
> Media controller. However the Media devices and V4L2 sub-device nodes are
> another matter as well as the drivers The drivers need to be able to rely on
> the frameworks to support them. On MC the driver simply has no way to
> release the media device at the right time. The same applies to V4L2
> sub-devices --- something that could be added to the patchset.

Huh? What's the sense of removing /dev/media0 and their associated
struct device before releasing the media graph?

The problem here is exactly the same as *any* other character device:
you need *first* to stop using whatever data struct is needed for
controlling /dev/media device and *then* removing /dev/media and
freeing their data structures, including struct device.

> >   
> > > * media: fix use-after-free in cdev_put() when app exits after driver unbind
> > >   5b28dde51d0c
> > > 
> > > The patch avoids the problem of deleting a character device (cdev_del())
> > > after its memory has been released. The change is sound as such but the
> > > problem is addressed by another, a lot more simple patch in my series:
> > > 
> > > <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=26fa8c1a3df5859d34cef8ef953e3a29a432a17b>  
> > 
> > Your approach is not clean, as it is based on a cdev's hack of doing:
> > 
> > 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
> > 
> > That is an ugly hack, as it touches inside cdev's internal stuff,
> > to do something that the driver's core doesn't expect. This is the
> > kind of patch that could cause messy errors, by cheating with the
> > cdev's internal refcount checking.
> > 
> > Btw, your approach require changes on *all* drivers, in order to make
> > device release work, with is a way more complex than changing just the
> > core. as the current approach. 
> >   
> > > It might be possible to reasonably continue from here if the next patch to
> > > be reverted did not depend on this one.
> > > 
> > > * media-device: dynamically allocate struct media_devnode
> > > 
> > > This creates a two-way dependency between struct media_devnode and
> > > media_device. This is very much against the original design which clearly
> > > separates the two: media_devnode is entirely independent of media_device.  
> > 
> > Those structs are still independent.
> >   
> > > The original intent was that another sub-system in the kernel such as the
> > > V4L2 could make use of media_devnode as well and while that hasn't happened,
> > > perhaps the two could be merged. There simply are no other reasons to keep
> > > the two structs separate.
> > > 
> > > The patch is certainly a workaround, as it (partially, again) works around
> > > issues in timing of releasing memory and accessing it.
> > > 
> > > The proper solutions regarding the media_device and media_devnode are either
> > > maintain the separation or unify the two, and this patch does nor suggests
> > > either of these. To the contrary: it makes either of these impossible by
> > > design, and this reason alone is enough to revert it.
> > > 
> > > The set I'm pushing maintains the separation and leaves the option of either
> > > merging the two (media_device and media_devnode) or making use of
> > > media_devnode elsewhere open.  
> > 
> > As mentioned before, being based on a hack doesn't make it nice
> > for upstream merging.
> > 
> > The current approach uses the recommended way: the structure with
> > cdev embedded should be dynamically allocated. Well, we could merge
> > media_device and media_devnode, but, in this case, we'll need to
> > not embed media_device, in order to avoid hacks like the above.  
> 
> The current approach is simply not enough, be cdev allocated separately from
> media_devnode or not: the drivers have no way properly release memory
> related to the media devices nor the v4l2 sub-devices. That memory will get
> accessed through IOCTL calls: simply checking that a device was registered
> at one point does not mean it continues to be registered in another point of
> time in the future, unless the two operations are serialised in a way or
> another.

Huh? The current approach relies on kref.

> >   
> > > > - change all drivers as the kAPI changes;    
> > > 
> > > The patchset actually adds new APIs rather than changing the OLD one --- as
> > > the old one was simply that drivers were responsible for allocating the data
> > > structures related to a media device. Existing drivers should continue to
> > > work as they did before without changes.  
> > 
> > Are you sure? Did you try the tests we did with binding/unbind, device
> > removal/insert and probe/remove of em28xx with your patches applied?  
> 
> I haven't tested that but as a matter of fact, I think I indeed have such
> device so I could test it. Changes on the DVB side would be needed as well
> in order to benefit from the API for allocating the media device.
> 
> > 
> > With that regards, you should really test it on an USB device, with
> > is hot-pluggable. There, you'll see a lot more memory lifetime issues
> > than on omap3.  
> 
> I'm not so sure about USB devices: unbinding works the same way whether the
> device is actually hot-pluggable. Still testing with different kind of
> devices definitely does help to root out issues, that's for sure.
> 
> >   
> > > Naturally, to get full benetifs of the changes, driver changes will be also
> > > required (see the beginning of the message).  
> > 
> > The test cases we did works on em28xx. If, after each patch of this series,
> > a regression happens, you need to address. I suspect that, even applying
> > the entire series, there will still be regressions, as I don't see any
> > changes to em28xx on this patch series.  
> 
> That's true, I've only changed the omap3isp driver so far as I wanted to get
> feedback on the framework changes.
> 
> >   
> > > The set has been posted as RFC in order to get reviews. It makes no sense to
> > > convert all the drivers and then start changing APIs, affecting all those
> > > converted drivers.  
> > 
> > Well, while it is not complete and still cause regressions, It can't be
> > considered ready for upstream review.
> >   
> > > > 
> > > > - be git bisectable, e. g. all patches should compile and run fine
> > > >   after each single patch, without introducing regressions.    
> > > 
> > > Compilation has already been tested (on ARM) on each patch applied in order.  
> > 
> > Good, but the best is to test it also on x86. Please notice that
> > just compiling doesn't ensure that it doesn't introduce regressions.
> > 
> > You should do your best to avoid regressions on every single patch
> > on your patch series.  
> 
> Certainly. Other than that, there would be fewer patches than there is
> now...
> 
> >   
> > > > 
> > > > That probably means that the series should be tested not only on
> > > > omap3, but also on some other device drivers.    
> > > 
> > > I fully agree with that. More review, testing and changes to at least some
> > > drivers (mostly for removable devices) will be needed before merging them,
> > > that's for sure.  
> > 
> > Good! One more point we agree :-)  
> 
> That's progress. It's a good start but we need more than that.
> 

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-11-29 11:13                 ` Mauro Carvalho Chehab
@ 2016-12-13 10:53                   ` Sakari Ailus
  2016-12-13 12:24                     ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2016-12-13 10:53 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shuah Khan, Sakari Ailus, linux-media, hverkuil, laurent.pinchart

Hi Mauro,

On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> Hi Sakari,
> 
> I answered you point to point below, but I suspect that you missed how the 
> current approach works. So, I decided to write a quick summary here.
> 
> The character devices /dev/media? are created via cdev, with relies on a 
> kobject per device, with has an embedded struct kref inside.
> 
> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> struct device, when the code does:
> 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
> 
> before calling cdev_add().
> 
> The current lifetime management is actually based on cdev's kobject's
> refcount, provided by its embedded kref.
> 
> The kref warrants that any data associated with /dev/media0 won't be 
> freed if there are any pending system call. In other words, when 
> cdev_del() is called, it will remove /dev/media0 from the filesystem, and
> will call kobject_put(). 
> 
> If the refcount is zero, it will call devnode->dev.release(). If the 
> kobject refcount is not zero, the data won't be freed.
> 
> So, in the best case scenario, there's no opened file descriptors
> by the time media device node is unregistered. So, it will free
> everything.
> 
> In the worse case scenario, e. g. when the driver is removed or 
> unbind while /dev/media0 has some opened file descriptor(s),
> the cdev logic will do the proper lifetime management.
> 
> On such case, /dev/media0 disappears from the file system, so another open
> is not possible anymore. The data structures will remain allocated until
> all associated file descriptors are not closed.
> 
> When all file descriptors are closed, the data will be freed.
> 
> On that time, it will call an optional dev.release() callback,
> responsible to free any other data struct that the driver allocated.

The patchset does not change this. It's not a question of the media_devnode
struct either. That's not an issue.

The issue is rather what else can be accessed through the media device and
other interfaces. As IOCTLs are not serialised with device removal (which
now releases much of the data structures) there's a high chance of accessing
released memory (or mutexes that have been already destroyed). An example of
that is here, stopping a running pipeline after unbinding the device. What
happens there is that the media device is released whilst it's in use
through the video device.

<URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
<URL:http://www.spinics.net/lists/linux-media/msg108943.html>

> 
> Em Mon, 28 Nov 2016 12:45:56 +0200
> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> 
> > Hi Mauro,
> > 
> > On Tue, Nov 22, 2016 at 03:44:29PM -0200, Mauro Carvalho Chehab wrote:
> > > Em Mon, 14 Nov 2016 15:27:22 +0200
> > > Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> > >   
> > > > Hi Mauro,
> > > > 
> > > > I'm replying below but let me first summarise the remaining problem area
> > > > that this patchset addresses.  
> > > 
> > > Sorry for answering too late. Somehow, I missed this email in the cloud.
> > >   
> > > > The problems you and Shuah have seen and partially addressed are related to
> > > > a larger picture which is the lifetime of (mostly) memory resources related
> > > > to various objects used by as well both the Media controller and V4L2
> > > > frameworks (including videobuf2) as the drivers which make use of these
> > > > frameworks.
> > > > 
> > > > The Media controller and V4L2 interfaces exposed by drivers consist of
> > > > multiple devices nodes, data structures with interdependencies within the
> > > > frameworks themselves and dependencies from the driver's own data structures
> > > > towards the framework data structures. The Media device and the media graph
> > > > objects are central to the problem area as well.
> > > > 
> > > > So what are the issues then? Until now, we've attempted to regulate the
> > > > users' ability to access the devices at the time they're being unregistered
> > > > (and the associated memory released), but that approach does not really
> > > > scale: you have to make sure that the unregistering also will not take place
> > > > _during_ the system call --- not just in the beginning of it.
> > > >
> > > > The media graph contains media graph objects, some of which are media
> > > > entities (contained in struct video_device or struct v4l2_subdev, for
> > > > instance). Media entities as graph nodes have links to other entities. In
> > > > order to implement the system calls, the drivers do parse this graph in
> > > > order to obtain information they need to obtain from it. For instance, it's
> > > > not uncommon for an implementation for video node format enumeration to
> > > > figure out which sub-device the link from that video nodes leads to. Drivers
> > > > may also have similar paths they follow.
> > > > 
> > > > Interrupt handling may also be taking place during the device removal during
> > > > which a number of data structures are now freed. This really does call for a
> > > > solution based on reference counting.
> > > > 
> > > > This leads to the conclusion that all the memory resources that could be
> > > > accessed by the drivers or the kernel frameworks must stay intact until the
> > > > last file handle to the said devices is closed. Otherwise, there is a
> > > > possibility of accessing released memory.  
> > > 
> > > So far, we're aligned.
> > >   
> > > > Right now in a lot of the cases, such as for video device and sub-device
> > > > nodes, we do release the memory when a device (as in struct device) is being
> > > > unregistered. There simply is in the current mainline kernel a way to do
> > > > this in a safe way.  
> > >   
> > > > Drivers do use devm_() family of functions to allocate
> > > > the memory of the media graph object and their internal data structures.  
> > > 
> > > Removing devm_() from those drivers seem to be the first thing to do,
> > > and it is independent from any MC rework.
> > > 
> > > As you'll see below, we have different opinions on other matters,
> > > so, my suggestion about how to proceed is that you should submit
> > > first the things we're aligned.
> > > 
> > > In other words, please submit the patches that get rid of devm_()
> > > first. Then, we can address the remaining stuff.  
> > 
> > Removing devm_*() is needed, but when should the memory be released then?
> > There's no callback currently from the media device the driver could use.
> 
> It should be easy to add a release callback if you need. Yet, I think you
> don't need a callback for that. Instead, you could just use the already
> existing one at struct device, e. g. export media_devnode_release() and,
> on drivers that need to release additional data, you would be doing something
> like:
> 
> 	static void my_devnode_release(struct device *cd)
> 	{
> 		// Some code that would release things before kfree(dev)
> 		kthread_stop(foo_thread);
> 		free(foo);
> 
> 		// will internally do a kfree(dev)
> 		media_devnode_release(cd);
> 
> 		// Some code that would release things after kfree(dev)
> 		free(bar);
> 	}

I think we really want to make correct implementations easy for drivers, not
requiring e.g. to use the media_devnode interface directly. As device
removal isn't serialised with IOCTLs, every driver should do this in order
to prevent device driver's / framework IOCTL handlers operating on released
memory.

> 
> And set the new release callback after registering the media device with:
> 
> 	media_device_register(...);
> 	devnode->dev.release = my_devnode_release;
> 
> The advantage of such approach is that it allows to control the order
> where things will be freed/released.

That's among the things the patchset does, but I think in a much nicer way.

> 
> > 
> > OTOH devm_*() interfaces are very convenient to use, it's a lot of extra
> > work for drivers to handle releasing all the resources. It'd be great to
> > find another object where to bind those resources. Still, device_release()
> > does first release devres resources and then calls the release() callback,
> > which obviously makes the setup problematic to begin with.
> 
> Shuah's approach is providing another way to bind things. Yet, maybe
> it could still be possible to use devm_*(), if it has a way to
> control when devm will free their resources. I suspect that, if you call
> devm_free() during dev.release() callback, or if you use the same struct
> device that is associated with the cdev, devm will work.

I wonder if we could use the media_devnode cdev's struct device to bind this
stuff to. It'd be gone when there's a certainty it'll no longer be needed.
The caveat is the release callback is called after the devres resources have
been released. So if a driver requires also the release callback, then it
has no longer access to memory allocated using devm_*() functions. I'd like
to have Laurent's opinion on this.

This solution is no longer enough when we have media devices where you can
remove entities, as those would only be released when the entire device is
gone. Or, there's a memory leak until removal of the media device. I don't
like that albeit there might be still very few practical problems.

> 
> > >   
> > > > 
> > > > With this patchset:
> > > > 
> > > > - The media_device which again contains the media_devnode is allocated
> > > >   dynamically. The lifetime of the media device --- and the media graph
> > > >   objects it contains --- is bound to device nodes that are bound to the
> > > >   media device (video and sub-device nodes) as well as open file handles.  
> > > 
> > > No. Data structures with cdev embedded into them have their lifetime
> > > controlled by the driver's core, and are destroyed only when there's
> > > no pending fops. The current approach uses device's core dev.release()  
> > 
> > Fair enough; that part is indeed handled towards the user space as far as I
> > can tell. However that's still not enough: the media graph contains the
> > graph objects, and the media device that holds the graph, must outlive the
> > graph objects themselves.

I meant to say that the media device, media graph and media graph objects
must stay around as long as they may be accessed from the user space. For
instance, the user may have a file handle opened from a video device, and the
media graph may be accessed through that file handle on media controller
enabled drivers. That's just one example.

This is what happens if you stop streaming in a pipeline after unbinding the
driver implementing the media device (same log as above):

<URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>

> 
> Sorry, didn't follow you here. What's the sense of not freeing the media
> graph before destroying the struct device associated with /dev/media0? 
> In other words, what should outlive after chardev's data is freed?
> 
> Please notice that the driver's core kobject kref ensures that the device
> release code is called only after all file descriptors are closed, and
> no other syscall would affect the cdev.
> 
> > Also removing entities doesn't really work currently: touching an entity, a
> > link or any kind of a graph object is not guaranteed to work unless you hold
> > the media graph lock. And that's simply unfeasible.
> 
> Sorry, again, didn't follow you here. The current strategy for adding
> and removing things at the graph relies on a lock, with serializes access
> to the graph, in order to avoid races if someone is trying to navigate on
> the graph while an object is being inserted or removed.

The graph mutex is taken during graph walk but nothing guarantees that, say,
an entity that was obtained during the graph walk will stay around once the
graph mutex is released.

An alternative would be to add refcounts to entities. That'd allow removing
graph objects safely during the media device lifetime. The streaming would
certainly need to be stopped first though.

> 
> It could be converted into a lockless approach (for example, using RCU),
> but this is a separate issue.

V4L2 sub-devices, besides an entity, may contain a device node as well. The
data structures span multiple drivers and may span multiple sub-systems
(think of ALSA) as well. The media entity is embedded in a sub-device data
structure allocated by drivers. Drivers, also other drivers that walk the
media graph, do make use of this knowledge to obtain sub-devices and access
controls in them.

> 
> The removal code needs to use whatever lock (or lockless) schema we
> use to serialize the access to the graph.
> 
> > Just look at what the
> > drivers do with entities: they use the v4l2_subdev interface and the control
> > framework to access them.
> > 
> > These data structures contain struct media_entity in them, and that entity
> > is part of the media graph. Other drivers use entities e.g. to obtain
> > control values from them. References should be used to prevent releasing the
> > memory.
> 
> References are used by the driver's core, using kobject_get() and
> kobject_put(). That warrants that dev.release() will only be called
> when nobody is using it anymore.

Yes, but this does not reach entities. Their lifetime is not related to
that.

> 
> > media_entity_get() and media_entity_put() do not do what you'd expect.
> 
> Please elaborate.

The functions simply get / put the module that owns the media entity. The
entities as such are not refcounted, and acquiring the driver's module does
not guarantee the entities aren't released.

> 
> > v4l2_subdev_call() should also verify that a sub-device is registered, and
> > make sure it will stay that way for the duration of call: the driver must be
> > able to expect the entity is accessible as the driver registered it.
> 
> Yes, but I can't see how this is related to this discussion. Before
> unregistering struct device, you need to unbind the subdevs.
> 
> The only case I can see of calling v4l2_subdev_call() after all file
> descriptors are closed is if you have some kthread running. You need to 
> call kthread_stop() for such kthreads before freeing struct device.
> 
> You could do it at a my_devnode_release() if you need the kthread running
> even after closing all file descriptors, or even before that, before
> calling media_device_unregister().
> 
> > The same goes for the control framework.
> 
> I don't think we have kthreads for controls. The control routines
> are called only when a file descriptor is opened. So, I don't see
> any possible issue with the control framework.

This isn't about kthreads; other drivers do this as well through
user-initiated actions. Such as starting or stopping streaming.

> 
> > As far as I remember, we somehow assumed that just acquiring the related
> > kernel modules would be enough to counter this but it is not.
> 
> Well, if not, you could use kobject_get() and kobject_put() to increment
> or decrement the cdev's refcount. Yet, I suspect that, if the drivers are 
> properly designed, you won't need to manually touch at the kref.

Entities are not refcounted. You can't get a kobject as there's none to get.

> 
> > 
> > I would prefer to postpone this however, the patchset already does enough
> > for a single patchset. Fixing this properly would likely require wait/wound
> > mutexes for individual entities.
> > 
> > > callback to release memory.
> > > 
> > > In other words, dev.release() is only called after the driver's base
> > > knows that the cdev is not in use anymore. So, no ioctl() or any
> > > other syscalls on that point.
> > > 
> > > Ok, nothing prevents some driver to do the wrong thing, keeping a
> > > copy of struct device and using it after free, for example storing
> > > it on a devm alocated memory, and printing some debug message
> > > after struct device is freed, but this is a driver's bug.
> > > 
> > > What really worries me on this series is that it seemed that you 
> > > didn't understood how the current approach works. So, you decided
> > > to just revert it and start from scratch. This is dangerous, as
> > > it could cause problems to other scenarios than yours.  
> > 
> > I'm not quite sure what do you mean.
> > 
> > It may well be that the patchset will require changes but that's precisely
> > the reason why patches are reviewed before merging.
> 
> From your comments and from your code, you didn't seem to realize that
> the current approach relies at the struct device refcount. See above.

That refcount is only for struct media_devnode. It's simply not enough, as
I've elaborated:

- No serialisation between IOCTL and releasing media device memory.

	- This causes that once the IOCTL call has begun, media device may
	  be released, and this released memory can be accessed by the IOCTL
	  handler.

- Drivers and frameworks that access the media device through other device
  nodes such as V4L2 devices will also access released memory.

There could be others.

> 
> > >   
> > > > - Care is taken that the unregistration process and releasing memory happens
> > > >   in the right order. This was not always the case previously.  
> > > 
> > > Freeing memory for struct media_devnode, struct device and struct cdev 
> > > is currently handled by the driver's core, when it known to be safe,
> > > and using the same logic that other subsystems do.  
> > 
> > That's simply not the case. Other sub-systems do not have graphs managed by
> > multiple device drivers for multiple physical devices that expose device
> > nodes through which all of those devices can be accessed. The problem domain
> > is far more complex than if you had a single physical device for which a
> > driver would expose a device node or two to the user space.
> 
> No. The current approach uses the struct device associated with /dev/media0,
> created via cdev, to provide a refcount for the data associated with the
> character device.
> 
> The struct device kobject refcount ensures that everything associated
> with it will only be freed after the refcount goes to zero.
> 
> As I said before, if are there any cases where the refcount is going
> early to zero, it is just a matter of adding a few kobject_get() and
> kobject_put() to ensure that this won't happen early, if the driver is
> so broken that it is unable to do the right refcount.

That's correct, but it only applies to struct media_devnode. Nothing else.
Please see above.

> 
> > 
> > > 
> > > We might do it different, but we need a strong reason to do it, as
> > > going away from the usual practice is dangerous.  
> > 
> > I think we already did that when we merged the original Media controller and
> > V4L2 sub-device patches...
> > 
> > >   
> > > > - The driver remains responsible for the memory of the video and sub-device
> > > >   nodes. However, now the Media controller provides a convenient callback to
> > > >   the driver to release any memory resources when the time has come to do
> > > >   so. This takes place just before the media device memory is released.  
> > > 
> > > Drivers could use devnode->dev.release for that. Of course, if they
> > > override it, they should be calling media_devnode_release() on their
> > > internal release functions.  
> > 
> > That'd be really hackish. The drivers currently don't deal with
> > media_devnode directly now and I don't think they should be obliged to.
> 
> I'm not against adding a callback instead. However, that makes it lose
> flexibility, as the callback will either be called before of after
> freeing struct device.
> 
> By overriding the dev.release callback, we have a finer control.
> 
> If you don't see any case where we'll be freeing data after freeing
> struct device, then a callback would work.
> 
> > >   
> > > > - Drivers that do not strictly need to be removable require no changes. The
> > > >   benefits of this set become tangible for any driver by changing how the
> > > >   driver allocates memory for the data structures. Ideally at least
> > > >   drivers for hot-removable devices should be converted.  
> > > 
> > > Drivers should allow device removal and/or driver removal. If you're
> > > doing any change here, you need to touch *all* drivers to use the new 
> > > way.  
> > 
> > Let's first agree on what needs to be fixed and how, and then think about
> > converting the drivers. Buggy code has a tendency to continue to be buggy
> > unless it is fixed (or replaced).
> 
> True, but as I said, this series create buggy code when it ignored what
> was fixed already. Also, a patch series to be considered ready for
> upstream need to do the needed changes on all drivers it affects.
> 
> > > > In order to make the current drivers to behave well it is necessary to make
> > > > changes to how memory is allocated in the drivers. If you look at the sample
> > > > patches that are part of the set for the omap3isp driver, you'll find that
> > > > around 95% of the changes are related to removing the user of devm_() family
> > > > of functions instead of Media controller API changes. In this regard, the
> > > > approach taken here requires very little if any additional overhead.  
> > > 
> > > Well, send the patches that do the 95% of the changes first e. g. devm_()
> > > removal, and check if you aren't using any dev_foo() printk after
> > > unregister, and send such patch series, without RFC. Then test what's
> > > still broken, if any and let's discuss with your results, in a way
> > > that we can all reproduce the issues you may be facing on other drivers
> > > that don't use devm*().  
> > 
> > As I said, there's currently no way to properly release these resources as
> > the driver won't receive a callback from media device release.
> 
> If you're so convinced that it is needed and you won't be overriding
> media device's struct device release callback, just add it. It should
> be a 3 lines patch.

Just the callback isn't enough. You need to get a reference to the kobject
when the graph components may be accessed.

Should we add reference counts to entities, we could add functions to get
references to entities, and make the media device their parent. That'd be a
largish change but it might not affect that many drivers after all.

> > 
> > > 
> > >   
> > > > On Wed, Nov 09, 2016 at 03:46:08PM -0200, Mauro Carvalho Chehab wrote:  
> > > > > Em Wed, 9 Nov 2016 10:00:58 -0700
> > > > > Shuah Khan <shuahkh@osg.samsung.com> escreveu:
> > > > >     
> > > > > > > Maybe we can get the Media Device Allocator API work in and then we can
> > > > > > > get your RFC series in after that. Here is what I propose:
> > > > > > > 
> > > > > > > - Keep the fixes in 4.9    
> > > > > 
> > > > > Fixes should always be kept. Reverting a fix is not an option.
> > > > > Instead, do incremental patches on the top of it.
> > > > >     
> > > > > > > - Get Media Device Allocator API patches into 4.9.      
> > > > > > 
> > > > > > I meant 4.10 not 4.9
> > > > > >     
> > > > > > > - snd-usb-auido work go into 4.10    
> > > > > 
> > > > > Sounds like a plan.
> > > > >     
> > > > > > > Then your RFC series could go in. I am looking at the RFC series and that
> > > > > > > the drivers need to change as well, so this RFC work could take longer.
> > > > > > > Since we have to make media_device sharable, it is necessary to have a
> > > > > > > global list approach Media Device Allocator API takes. So it is possible
> > > > > > > for your RFC series to go on top of the Media Device Allocator API.    
> > > > > 
> > > > > Firstly, the RFC series should be converted into something that can
> > > > > be applicable upstream, e. g.:
> > > > > 
> > > > > - doing the changes over the top of upstream, instead of needing to
> > > > >   revert patches;    
> > > > 
> > > > The patches are in fact on top of the current media-tree, or were when they
> > > > were sent (v4).
> > > > 
> > > > The reason I'm reverting patches is that the reason why these patches were
> > > > merged was not because they would have been a sound way forward for the
> > > > Media controller framework, but because they partially worked around issues
> > > > in a device being in use while it was removed.
> > > > 
> > > > They never were a complete fix for these problems nor I do think they could
> > > > be extended to be such. There were also unaddressed issues in these patches
> > > > pointed out during the review. For these reasons I'm reverting the three
> > > > patches. In more detail:
> > > > 
> > > > * media: fix media devnode ioctl/syscall and unregister race
> > > >   6f0dd24a084a
> > > > 
> > > > The patch clears the registered bit before performing the steps related to
> > > > unregistering a media device, but the bit is checked only at the beginning
> > > > of the IOCTL call. As unregistering a device and an IOCTL call on a file
> > > > handle of that device are not serialised, nothing guarantees the IOCTL call
> > > > will finish with the registered bit still in the same state. Serialising the
> > > > two e.g. by using a mutex is hardly a feasible solution for this.
> > > > 
> > > > I may have pointed out the original problem but this is not the solution.
> > > > 
> > > > <URL:http://www.spinics.net/lists/linux-media/msg101295.html>
> > > > 
> > > > The right solution is instead to make sure the data structures related to
> > > > the media device will not disappear while the IOCTL call is in progress (at
> > > > least).  
> > > 
> > > They won't. Device core won't call dev.release() while an ioctl doesn't
> > > finish. So, the struct device and struct devnode will exist while the
> > > ioctl (or any other fops) is handled.  
> > 
> > I believe you're right when it comes to drivers using video devices without
> > Media controller. However the Media devices and V4L2 sub-device nodes are
> > another matter as well as the drivers The drivers need to be able to rely on
> > the frameworks to support them. On MC the driver simply has no way to
> > release the media device at the right time. The same applies to V4L2
> > sub-devices --- something that could be added to the patchset.
> 
> Huh? What's the sense of removing /dev/media0 and their associated
> struct device before releasing the media graph?
> 
> The problem here is exactly the same as *any* other character device:
> you need *first* to stop using whatever data struct is needed for
> controlling /dev/media device and *then* removing /dev/media and
> freeing their data structures, including struct device.

I don't disagree about that particular point.

(Please see the beginning of the message as well.)

> 
> > >   
> > > > * media: fix use-after-free in cdev_put() when app exits after driver unbind
> > > >   5b28dde51d0c
> > > > 
> > > > The patch avoids the problem of deleting a character device (cdev_del())
> > > > after its memory has been released. The change is sound as such but the
> > > > problem is addressed by another, a lot more simple patch in my series:
> > > > 
> > > > <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=26fa8c1a3df5859d34cef8ef953e3a29a432a17b>  
> > > 
> > > Your approach is not clean, as it is based on a cdev's hack of doing:
> > > 
> > > 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
> > > 
> > > That is an ugly hack, as it touches inside cdev's internal stuff,
> > > to do something that the driver's core doesn't expect. This is the
> > > kind of patch that could cause messy errors, by cheating with the
> > > cdev's internal refcount checking.
> > > 
> > > Btw, your approach require changes on *all* drivers, in order to make
> > > device release work, with is a way more complex than changing just the
> > > core. as the current approach. 
> > >   
> > > > It might be possible to reasonably continue from here if the next patch to
> > > > be reverted did not depend on this one.
> > > > 
> > > > * media-device: dynamically allocate struct media_devnode
> > > > 
> > > > This creates a two-way dependency between struct media_devnode and
> > > > media_device. This is very much against the original design which clearly
> > > > separates the two: media_devnode is entirely independent of media_device.  
> > > 
> > > Those structs are still independent.
> > >   
> > > > The original intent was that another sub-system in the kernel such as the
> > > > V4L2 could make use of media_devnode as well and while that hasn't happened,
> > > > perhaps the two could be merged. There simply are no other reasons to keep
> > > > the two structs separate.
> > > > 
> > > > The patch is certainly a workaround, as it (partially, again) works around
> > > > issues in timing of releasing memory and accessing it.
> > > > 
> > > > The proper solutions regarding the media_device and media_devnode are either
> > > > maintain the separation or unify the two, and this patch does nor suggests
> > > > either of these. To the contrary: it makes either of these impossible by
> > > > design, and this reason alone is enough to revert it.
> > > > 
> > > > The set I'm pushing maintains the separation and leaves the option of either
> > > > merging the two (media_device and media_devnode) or making use of
> > > > media_devnode elsewhere open.  
> > > 
> > > As mentioned before, being based on a hack doesn't make it nice
> > > for upstream merging.
> > > 
> > > The current approach uses the recommended way: the structure with
> > > cdev embedded should be dynamically allocated. Well, we could merge
> > > media_device and media_devnode, but, in this case, we'll need to
> > > not embed media_device, in order to avoid hacks like the above.  
> > 
> > The current approach is simply not enough, be cdev allocated separately from
> > media_devnode or not: the drivers have no way properly release memory
> > related to the media devices nor the v4l2 sub-devices. That memory will get
> > accessed through IOCTL calls: simply checking that a device was registered
> > at one point does not mean it continues to be registered in another point of
> > time in the future, unless the two operations are serialised in a way or
> > another.
> 
> Huh? The current approach relies on kref.
> 
> > >   
> > > > > - change all drivers as the kAPI changes;    
> > > > 
> > > > The patchset actually adds new APIs rather than changing the OLD one --- as
> > > > the old one was simply that drivers were responsible for allocating the data
> > > > structures related to a media device. Existing drivers should continue to
> > > > work as they did before without changes.  
> > > 
> > > Are you sure? Did you try the tests we did with binding/unbind, device
> > > removal/insert and probe/remove of em28xx with your patches applied?  
> > 
> > I haven't tested that but as a matter of fact, I think I indeed have such
> > device so I could test it. Changes on the DVB side would be needed as well
> > in order to benefit from the API for allocating the media device.
> > 
> > > 
> > > With that regards, you should really test it on an USB device, with
> > > is hot-pluggable. There, you'll see a lot more memory lifetime issues
> > > than on omap3.  
> > 
> > I'm not so sure about USB devices: unbinding works the same way whether the
> > device is actually hot-pluggable. Still testing with different kind of
> > devices definitely does help to root out issues, that's for sure.
> > 
> > >   
> > > > Naturally, to get full benetifs of the changes, driver changes will be also
> > > > required (see the beginning of the message).  
> > > 
> > > The test cases we did works on em28xx. If, after each patch of this series,
> > > a regression happens, you need to address. I suspect that, even applying
> > > the entire series, there will still be regressions, as I don't see any
> > > changes to em28xx on this patch series.  
> > 
> > That's true, I've only changed the omap3isp driver so far as I wanted to get
> > feedback on the framework changes.
> > 
> > >   
> > > > The set has been posted as RFC in order to get reviews. It makes no sense to
> > > > convert all the drivers and then start changing APIs, affecting all those
> > > > converted drivers.  
> > > 
> > > Well, while it is not complete and still cause regressions, It can't be
> > > considered ready for upstream review.
> > >   
> > > > > 
> > > > > - be git bisectable, e. g. all patches should compile and run fine
> > > > >   after each single patch, without introducing regressions.    
> > > > 
> > > > Compilation has already been tested (on ARM) on each patch applied in order.  
> > > 
> > > Good, but the best is to test it also on x86. Please notice that
> > > just compiling doesn't ensure that it doesn't introduce regressions.
> > > 
> > > You should do your best to avoid regressions on every single patch
> > > on your patch series.  
> > 
> > Certainly. Other than that, there would be fewer patches than there is
> > now...
> > 
> > >   
> > > > > 
> > > > > That probably means that the series should be tested not only on
> > > > > omap3, but also on some other device drivers.    
> > > > 
> > > > I fully agree with that. More review, testing and changes to at least some
> > > > drivers (mostly for removable devices) will be needed before merging them,
> > > > that's for sure.  
> > > 
> > > Good! One more point we agree :-)  
> > 
> > That's progress. It's a good start but we need more than that.
> > 
> 
> Thanks,
> Mauro
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-13 10:53                   ` Sakari Ailus
@ 2016-12-13 12:24                     ` Mauro Carvalho Chehab
  2016-12-13 22:23                       ` Shuah Khan
  2016-12-15 11:30                       ` Sakari Ailus
  0 siblings, 2 replies; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-13 12:24 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Shuah Khan, Sakari Ailus, linux-media, hverkuil, laurent.pinchart

Em Tue, 13 Dec 2016 12:53:05 +0200
Sakari Ailus <sakari.ailus@iki.fi> escreveu:

> Hi Mauro,
> 
> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> > Hi Sakari,
> > 
> > I answered you point to point below, but I suspect that you missed how the 
> > current approach works. So, I decided to write a quick summary here.
> > 
> > The character devices /dev/media? are created via cdev, with relies on a 
> > kobject per device, with has an embedded struct kref inside.
> > 
> > Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> > struct device, when the code does:
> > 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
> > 
> > before calling cdev_add().
> > 
> > The current lifetime management is actually based on cdev's kobject's
> > refcount, provided by its embedded kref.
> > 
> > The kref warrants that any data associated with /dev/media0 won't be 
> > freed if there are any pending system call. In other words, when 
> > cdev_del() is called, it will remove /dev/media0 from the filesystem, and
> > will call kobject_put(). 
> > 
> > If the refcount is zero, it will call devnode->dev.release(). If the 
> > kobject refcount is not zero, the data won't be freed.
> > 
> > So, in the best case scenario, there's no opened file descriptors
> > by the time media device node is unregistered. So, it will free
> > everything.
> > 
> > In the worse case scenario, e. g. when the driver is removed or 
> > unbind while /dev/media0 has some opened file descriptor(s),
> > the cdev logic will do the proper lifetime management.
> > 
> > On such case, /dev/media0 disappears from the file system, so another open
> > is not possible anymore. The data structures will remain allocated until
> > all associated file descriptors are not closed.
> > 
> > When all file descriptors are closed, the data will be freed.
> > 
> > On that time, it will call an optional dev.release() callback,
> > responsible to free any other data struct that the driver allocated.  
> 
> The patchset does not change this. It's not a question of the media_devnode
> struct either. That's not an issue.
> 
> The issue is rather what else can be accessed through the media device and
> other interfaces. As IOCTLs are not serialised with device removal (which
> now releases much of the data structures) 

Huh? ioctls are serialized with struct device removal. The Driver core
warrants that.

> there's a high chance of accessing
> released memory (or mutexes that have been already destroyed). An example of
> that is here, stopping a running pipeline after unbinding the device. What
> happens there is that the media device is released whilst it's in use
> through the video device.
> 
> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>

It is not clear from the logs what the driver tried to do, but
that sounds like a driver's bug, with was not prepared to properly
handle unbinds.

The problem here is that isp_video_release() is called by V4L2
release logic, and not by the MC one:

static const struct v4l2_file_operations isp_video_fops = {
	.owner		= THIS_MODULE,
	.open		= isp_video_open,
	.release	= isp_video_release,
	.poll		= vb2_fop_poll,
	.unlocked_ioctl	= video_ioctl2,
	.mmap		= vb2_fop_mmap,
};

It seems that the driver's logic allows it to be called before or
after destroying the MC.

Assuming that, if the OMAP3 driver is not used it works,
it means that, if the isp_video_release() is called
first, no errors will happen, but if MC is destroyed before
V4L2 call to its .release() callback, as there's no logic at the
driver that would detect it, isp_video_release() will be calling
isp_video_streamoff(), with depends on the MC to work.

On a first glance, I can see two ways of fixing it:

1) to increment devnode's device kobject refcount at OMAP3 .probe(), 
decrementing it only at isp_video_release(). That will ensure that
MC will only be removed after V4L2 removal.

2) to call isp_video_streamoff() before removing the MC stuff, e. g.
inside the MC .release() callback. 

That could be done by overwriting the dev.release() callback at
omap3 driver, as I discussed on my past e-mails, and flagging the
driver that it should not accept streamon anymore, as the hardware
is being disconnecting.

Btw, that explains a lot why Shuah can't reproduce the stuff you're
complaining on her USB hardware.

The USB subsystem has a a .disconnect() callback that notifies
the drivers that a device was unbound (likely physically removed).
The way USB media drivers handle it is by returning -ENODEV to any
V4L2 call that would try to touch at the hardware after unbound.

So, on au0828, there's no need to add any extra release logic.

> <URL:http://www.spinics.net/lists/linux-media/msg108943.html>
> 
> > 
> > Em Mon, 28 Nov 2016 12:45:56 +0200
> > Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> >   
> > > Hi Mauro,
> > > 
> > > On Tue, Nov 22, 2016 at 03:44:29PM -0200, Mauro Carvalho Chehab wrote:  
> > > > Em Mon, 14 Nov 2016 15:27:22 +0200
> > > > Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> > > >     
> > > > > Hi Mauro,
> > > > > 
> > > > > I'm replying below but let me first summarise the remaining problem area
> > > > > that this patchset addresses.    
> > > > 
> > > > Sorry for answering too late. Somehow, I missed this email in the cloud.
> > > >     
> > > > > The problems you and Shuah have seen and partially addressed are related to
> > > > > a larger picture which is the lifetime of (mostly) memory resources related
> > > > > to various objects used by as well both the Media controller and V4L2
> > > > > frameworks (including videobuf2) as the drivers which make use of these
> > > > > frameworks.
> > > > > 
> > > > > The Media controller and V4L2 interfaces exposed by drivers consist of
> > > > > multiple devices nodes, data structures with interdependencies within the
> > > > > frameworks themselves and dependencies from the driver's own data structures
> > > > > towards the framework data structures. The Media device and the media graph
> > > > > objects are central to the problem area as well.
> > > > > 
> > > > > So what are the issues then? Until now, we've attempted to regulate the
> > > > > users' ability to access the devices at the time they're being unregistered
> > > > > (and the associated memory released), but that approach does not really
> > > > > scale: you have to make sure that the unregistering also will not take place
> > > > > _during_ the system call --- not just in the beginning of it.
> > > > >
> > > > > The media graph contains media graph objects, some of which are media
> > > > > entities (contained in struct video_device or struct v4l2_subdev, for
> > > > > instance). Media entities as graph nodes have links to other entities. In
> > > > > order to implement the system calls, the drivers do parse this graph in
> > > > > order to obtain information they need to obtain from it. For instance, it's
> > > > > not uncommon for an implementation for video node format enumeration to
> > > > > figure out which sub-device the link from that video nodes leads to. Drivers
> > > > > may also have similar paths they follow.
> > > > > 
> > > > > Interrupt handling may also be taking place during the device removal during
> > > > > which a number of data structures are now freed. This really does call for a
> > > > > solution based on reference counting.
> > > > > 
> > > > > This leads to the conclusion that all the memory resources that could be
> > > > > accessed by the drivers or the kernel frameworks must stay intact until the
> > > > > last file handle to the said devices is closed. Otherwise, there is a
> > > > > possibility of accessing released memory.    
> > > > 
> > > > So far, we're aligned.
> > > >     
> > > > > Right now in a lot of the cases, such as for video device and sub-device
> > > > > nodes, we do release the memory when a device (as in struct device) is being
> > > > > unregistered. There simply is in the current mainline kernel a way to do
> > > > > this in a safe way.    
> > > >     
> > > > > Drivers do use devm_() family of functions to allocate
> > > > > the memory of the media graph object and their internal data structures.    
> > > > 
> > > > Removing devm_() from those drivers seem to be the first thing to do,
> > > > and it is independent from any MC rework.
> > > > 
> > > > As you'll see below, we have different opinions on other matters,
> > > > so, my suggestion about how to proceed is that you should submit
> > > > first the things we're aligned.
> > > > 
> > > > In other words, please submit the patches that get rid of devm_()
> > > > first. Then, we can address the remaining stuff.    
> > > 
> > > Removing devm_*() is needed, but when should the memory be released then?
> > > There's no callback currently from the media device the driver could use.  
> > 
> > It should be easy to add a release callback if you need. Yet, I think you
> > don't need a callback for that. Instead, you could just use the already
> > existing one at struct device, e. g. export media_devnode_release() and,
> > on drivers that need to release additional data, you would be doing something
> > like:
> > 
> > 	static void my_devnode_release(struct device *cd)
> > 	{
> > 		// Some code that would release things before kfree(dev)
> > 		kthread_stop(foo_thread);
> > 		free(foo);
> > 
> > 		// will internally do a kfree(dev)
> > 		media_devnode_release(cd);
> > 
> > 		// Some code that would release things after kfree(dev)
> > 		free(bar);
> > 	}  
> 
> I think we really want to make correct implementations easy for drivers, not
> requiring e.g. to use the media_devnode interface directly. As device
> removal isn't serialised with IOCTLs, every driver should do this in order
> to prevent device driver's / framework IOCTL handlers operating on released
> memory.

Well, it would be easy to add a callback at that media_devnode_release()
would call on drivers that would need it.

As I said before, USB drivers don't need anything extra at devnode
release. I'd say more, even PCI drivers won't likely need it, as they
don't use MC to do things like streamoff.

I suspect that such special .release() logic is only needed on drivers
that don't work without MC, e. g. subdev-based ones.

> 
> > 
> > And set the new release callback after registering the media device with:
> > 
> > 	media_device_register(...);
> > 	devnode->dev.release = my_devnode_release;
> > 
> > The advantage of such approach is that it allows to control the order
> > where things will be freed/released.  
> 
> That's among the things the patchset does, but I think in a much nicer way.

A /21 patch series that break release on all drivers but OMAP3 doesn't seem
to be a cleaner/nicer approach.

> > > OTOH devm_*() interfaces are very convenient to use, it's a lot of extra
> > > work for drivers to handle releasing all the resources. It'd be great to
> > > find another object where to bind those resources. Still, device_release()
> > > does first release devres resources and then calls the release() callback,
> > > which obviously makes the setup problematic to begin with.  
> > 
> > Shuah's approach is providing another way to bind things. Yet, maybe
> > it could still be possible to use devm_*(), if it has a way to
> > control when devm will free their resources. I suspect that, if you call
> > devm_free() during dev.release() callback, or if you use the same struct
> > device that is associated with the cdev, devm will work.  
> 
> I wonder if we could use the media_devnode cdev's struct device to bind this
> stuff to. It'd be gone when there's a certainty it'll no longer be needed.

Maybe, but the real problem here is that some data are associated to
MC, while others are associated with V4L2. 

If you can identify what data is associated to MC, and provide a way
to handle MC ".disconnect()" so that V4L2 won't be trying to use the
MC-related data, then it would be safe to use devm to allocate memory.

> The caveat is the release callback is called after the devres resources have
> been released. So if a driver requires also the release callback, then it
> has no longer access to memory allocated using devm_*() functions. I'd like
> to have Laurent's opinion on this.
> 
> This solution is no longer enough when we have media devices where you can
> remove entities, as those would only be released when the entire device is
> gone. Or, there's a memory leak until removal of the media device. I don't
> like that albeit there might be still very few practical problems.

Agreed.

> 
> >   
> > > >     
> > > > > 
> > > > > With this patchset:
> > > > > 
> > > > > - The media_device which again contains the media_devnode is allocated
> > > > >   dynamically. The lifetime of the media device --- and the media graph
> > > > >   objects it contains --- is bound to device nodes that are bound to the
> > > > >   media device (video and sub-device nodes) as well as open file handles.    
> > > > 
> > > > No. Data structures with cdev embedded into them have their lifetime
> > > > controlled by the driver's core, and are destroyed only when there's
> > > > no pending fops. The current approach uses device's core dev.release()    
> > > 
> > > Fair enough; that part is indeed handled towards the user space as far as I
> > > can tell. However that's still not enough: the media graph contains the
> > > graph objects, and the media device that holds the graph, must outlive the
> > > graph objects themselves.  
> 
> I meant to say that the media device, media graph and media graph objects
> must stay around as long as they may be accessed from the user space. For
> instance, the user may have a file handle opened from a video device, and the
> media graph may be accessed through that file handle on media controller
> enabled drivers. That's just one example.
> 
> This is what happens if you stop streaming in a pipeline after unbinding the
> driver implementing the media device (same log as above):
> 
> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>

As I said, if this is a requirement, you could increase kobject's 
reference. If not, you could just stop streaming when the media device
has gone.

> 
> > 
> > Sorry, didn't follow you here. What's the sense of not freeing the media
> > graph before destroying the struct device associated with /dev/media0? 
> > In other words, what should outlive after chardev's data is freed?
> > 
> > Please notice that the driver's core kobject kref ensures that the device
> > release code is called only after all file descriptors are closed, and
> > no other syscall would affect the cdev.
> >   
> > > Also removing entities doesn't really work currently: touching an entity, a
> > > link or any kind of a graph object is not guaranteed to work unless you hold
> > > the media graph lock. And that's simply unfeasible.  
> > 
> > Sorry, again, didn't follow you here. The current strategy for adding
> > and removing things at the graph relies on a lock, with serializes access
> > to the graph, in order to avoid races if someone is trying to navigate on
> > the graph while an object is being inserted or removed.  
> 
> The graph mutex is taken during graph walk but nothing guarantees that, say,
> an entity that was obtained during the graph walk will stay around once the
> graph mutex is released.
> 
> An alternative would be to add refcounts to entities. That'd allow removing
> graph objects safely during the media device lifetime. The streaming would
> certainly need to be stopped first though.

My first MC patch series added refcounts to all graph objects[1] ;)

[1] https://patchwork.linuxtv.org/patch/30766/

My original idea were to increment entities kref when links were
created. This way, it would be easier to cleanup stuff, as they
could be destroyed in any order, specially with dynamic entity
creation/removal. Also, graph traversal could increment link's 
krefs to avoid them to be destroyed while navigating on them,
without needing to keep media lock hold for a long time.

I removed it on the second version because Laurent was unable to see
any usage for that, but, IMHO, if properly implemented, it would
help to support dynamic entities removal/insertion.

So, I don't mind if someone would send a patch adding it again.

> > It could be converted into a lockless approach (for example, using RCU),
> > but this is a separate issue.  
> 
> V4L2 sub-devices, besides an entity, may contain a device node as well. The
> data structures span multiple drivers and may span multiple sub-systems
> (think of ALSA) as well. The media entity is embedded in a sub-device data
> structure allocated by drivers. Drivers, also other drivers that walk the
> media graph, do make use of this knowledge to obtain sub-devices and access
> controls in them.

Well, a kref-based approach would avoid locks most of the time.
Perhaps it could be combined with RCS.

> 
> > 
> > The removal code needs to use whatever lock (or lockless) schema we
> > use to serialize the access to the graph.
> >   
> > > Just look at what the
> > > drivers do with entities: they use the v4l2_subdev interface and the control
> > > framework to access them.
> > > 
> > > These data structures contain struct media_entity in them, and that entity
> > > is part of the media graph. Other drivers use entities e.g. to obtain
> > > control values from them. References should be used to prevent releasing the
> > > memory.  
> > 
> > References are used by the driver's core, using kobject_get() and
> > kobject_put(). That warrants that dev.release() will only be called
> > when nobody is using it anymore.  
> 
> Yes, but this does not reach entities. Their lifetime is not related to
> that.

No. That's why, currently, we need to lock before adding/removing
graph objects.

> >   
> > > media_entity_get() and media_entity_put() do not do what you'd expect.  
> > 
> > Please elaborate.  
> 
> The functions simply get / put the module that owns the media entity. The
> entities as such are not refcounted, and acquiring the driver's module does
> not guarantee the entities aren't released.

Yes.

> >   
> > > v4l2_subdev_call() should also verify that a sub-device is registered, and
> > > make sure it will stay that way for the duration of call: the driver must be
> > > able to expect the entity is accessible as the driver registered it.  
> > 
> > Yes, but I can't see how this is related to this discussion. Before
> > unregistering struct device, you need to unbind the subdevs.
> > 
> > The only case I can see of calling v4l2_subdev_call() after all file
> > descriptors are closed is if you have some kthread running. You need to 
> > call kthread_stop() for such kthreads before freeing struct device.
> > 
> > You could do it at a my_devnode_release() if you need the kthread running
> > even after closing all file descriptors, or even before that, before
> > calling media_device_unregister().
> >   
> > > The same goes for the control framework.  
> > 
> > I don't think we have kthreads for controls. The control routines
> > are called only when a file descriptor is opened. So, I don't see
> > any possible issue with the control framework.  
> 
> This isn't about kthreads; other drivers do this as well through
> user-initiated actions. Such as starting or stopping streaming.

As explaining before, either streamoff should happen during MC removal
or you need to increment kobject refcount to serialize the removal
order.

> >   
> > > As far as I remember, we somehow assumed that just acquiring the related
> > > kernel modules would be enough to counter this but it is not.  
> > 
> > Well, if not, you could use kobject_get() and kobject_put() to increment
> > or decrement the cdev's refcount. Yet, I suspect that, if the drivers are 
> > properly designed, you won't need to manually touch at the kref.  
> 
> Entities are not refcounted. You can't get a kobject as there's none to get.

Entity removal is protected via mutex.

> >   
> > > 
> > > I would prefer to postpone this however, the patchset already does enough
> > > for a single patchset. Fixing this properly would likely require wait/wound
> > > mutexes for individual entities.
> > >   
> > > > callback to release memory.
> > > > 
> > > > In other words, dev.release() is only called after the driver's base
> > > > knows that the cdev is not in use anymore. So, no ioctl() or any
> > > > other syscalls on that point.
> > > > 
> > > > Ok, nothing prevents some driver to do the wrong thing, keeping a
> > > > copy of struct device and using it after free, for example storing
> > > > it on a devm alocated memory, and printing some debug message
> > > > after struct device is freed, but this is a driver's bug.
> > > > 
> > > > What really worries me on this series is that it seemed that you 
> > > > didn't understood how the current approach works. So, you decided
> > > > to just revert it and start from scratch. This is dangerous, as
> > > > it could cause problems to other scenarios than yours.    
> > > 
> > > I'm not quite sure what do you mean.
> > > 
> > > It may well be that the patchset will require changes but that's precisely
> > > the reason why patches are reviewed before merging.  
> > 
> > From your comments and from your code, you didn't seem to realize that
> > the current approach relies at the struct device refcount. See above.  
> 
> That refcount is only for struct media_devnode. It's simply not enough, as
> I've elaborated:
> 
> - No serialisation between IOCTL and releasing media device memory.
> 
> 	- This causes that once the IOCTL call has begun, media device may
> 	  be released, and this released memory can be accessed by the IOCTL
> 	  handler.
> 
> - Drivers and frameworks that access the media device through other device
>   nodes such as V4L2 devices will also access released memory.
> 
> There could be others.

See above.


> 
> >   
> > > >     
> > > > > - Care is taken that the unregistration process and releasing memory happens
> > > > >   in the right order. This was not always the case previously.    
> > > > 
> > > > Freeing memory for struct media_devnode, struct device and struct cdev 
> > > > is currently handled by the driver's core, when it known to be safe,
> > > > and using the same logic that other subsystems do.    
> > > 
> > > That's simply not the case. Other sub-systems do not have graphs managed by
> > > multiple device drivers for multiple physical devices that expose device
> > > nodes through which all of those devices can be accessed. The problem domain
> > > is far more complex than if you had a single physical device for which a
> > > driver would expose a device node or two to the user space.  
> > 
> > No. The current approach uses the struct device associated with /dev/media0,
> > created via cdev, to provide a refcount for the data associated with the
> > character device.
> > 
> > The struct device kobject refcount ensures that everything associated
> > with it will only be freed after the refcount goes to zero.
> > 
> > As I said before, if are there any cases where the refcount is going
> > early to zero, it is just a matter of adding a few kobject_get() and
> > kobject_put() to ensure that this won't happen early, if the driver is
> > so broken that it is unable to do the right refcount.  
> 
> That's correct, but it only applies to struct media_devnode. Nothing else.
> Please see above.

Well, don't remove entities before stop using them.

> > > > We might do it different, but we need a strong reason to do it, as
> > > > going away from the usual practice is dangerous.    
> > > 
> > > I think we already did that when we merged the original Media controller and
> > > V4L2 sub-device patches...
> > >   
> > > >     
> > > > > - The driver remains responsible for the memory of the video and sub-device
> > > > >   nodes. However, now the Media controller provides a convenient callback to
> > > > >   the driver to release any memory resources when the time has come to do
> > > > >   so. This takes place just before the media device memory is released.    
> > > > 
> > > > Drivers could use devnode->dev.release for that. Of course, if they
> > > > override it, they should be calling media_devnode_release() on their
> > > > internal release functions.    
> > > 
> > > That'd be really hackish. The drivers currently don't deal with
> > > media_devnode directly now and I don't think they should be obliged to.  
> > 
> > I'm not against adding a callback instead. However, that makes it lose
> > flexibility, as the callback will either be called before of after
> > freeing struct device.
> > 
> > By overriding the dev.release callback, we have a finer control.
> > 
> > If you don't see any case where we'll be freeing data after freeing
> > struct device, then a callback would work.
> >   
> > > >     
> > > > > - Drivers that do not strictly need to be removable require no changes. The
> > > > >   benefits of this set become tangible for any driver by changing how the
> > > > >   driver allocates memory for the data structures. Ideally at least
> > > > >   drivers for hot-removable devices should be converted.    
> > > > 
> > > > Drivers should allow device removal and/or driver removal. If you're
> > > > doing any change here, you need to touch *all* drivers to use the new 
> > > > way.    
> > > 
> > > Let's first agree on what needs to be fixed and how, and then think about
> > > converting the drivers. Buggy code has a tendency to continue to be buggy
> > > unless it is fixed (or replaced).  
> > 
> > True, but as I said, this series create buggy code when it ignored what
> > was fixed already. Also, a patch series to be considered ready for
> > upstream need to do the needed changes on all drivers it affects.
> >   
> > > > > In order to make the current drivers to behave well it is necessary to make
> > > > > changes to how memory is allocated in the drivers. If you look at the sample
> > > > > patches that are part of the set for the omap3isp driver, you'll find that
> > > > > around 95% of the changes are related to removing the user of devm_() family
> > > > > of functions instead of Media controller API changes. In this regard, the
> > > > > approach taken here requires very little if any additional overhead.    
> > > > 
> > > > Well, send the patches that do the 95% of the changes first e. g. devm_()
> > > > removal, and check if you aren't using any dev_foo() printk after
> > > > unregister, and send such patch series, without RFC. Then test what's
> > > > still broken, if any and let's discuss with your results, in a way
> > > > that we can all reproduce the issues you may be facing on other drivers
> > > > that don't use devm*().    
> > > 
> > > As I said, there's currently no way to properly release these resources as
> > > the driver won't receive a callback from media device release.  
> > 
> > If you're so convinced that it is needed and you won't be overriding
> > media device's struct device release callback, just add it. It should
> > be a 3 lines patch.  
> 
> Just the callback isn't enough. You need to get a reference to the kobject
> when the graph components may be accessed.

Yes. Or protect it with a mutex.

> Should we add reference counts to entities, we could add functions to get
> references to entities, and make the media device their parent. That'd be a
> largish change but it might not affect that many drivers after all.

Adding krefs to graph objects can be handled inside the core, except for
the drivers that implement their own graph traversal functions.

> 
> > >   
> > > > 
> > > >     
> > > > > On Wed, Nov 09, 2016 at 03:46:08PM -0200, Mauro Carvalho Chehab wrote:    
> > > > > > Em Wed, 9 Nov 2016 10:00:58 -0700
> > > > > > Shuah Khan <shuahkh@osg.samsung.com> escreveu:
> > > > > >       
> > > > > > > > Maybe we can get the Media Device Allocator API work in and then we can
> > > > > > > > get your RFC series in after that. Here is what I propose:
> > > > > > > > 
> > > > > > > > - Keep the fixes in 4.9      
> > > > > > 
> > > > > > Fixes should always be kept. Reverting a fix is not an option.
> > > > > > Instead, do incremental patches on the top of it.
> > > > > >       
> > > > > > > > - Get Media Device Allocator API patches into 4.9.        
> > > > > > > 
> > > > > > > I meant 4.10 not 4.9
> > > > > > >       
> > > > > > > > - snd-usb-auido work go into 4.10      
> > > > > > 
> > > > > > Sounds like a plan.
> > > > > >       
> > > > > > > > Then your RFC series could go in. I am looking at the RFC series and that
> > > > > > > > the drivers need to change as well, so this RFC work could take longer.
> > > > > > > > Since we have to make media_device sharable, it is necessary to have a
> > > > > > > > global list approach Media Device Allocator API takes. So it is possible
> > > > > > > > for your RFC series to go on top of the Media Device Allocator API.      
> > > > > > 
> > > > > > Firstly, the RFC series should be converted into something that can
> > > > > > be applicable upstream, e. g.:
> > > > > > 
> > > > > > - doing the changes over the top of upstream, instead of needing to
> > > > > >   revert patches;      
> > > > > 
> > > > > The patches are in fact on top of the current media-tree, or were when they
> > > > > were sent (v4).
> > > > > 
> > > > > The reason I'm reverting patches is that the reason why these patches were
> > > > > merged was not because they would have been a sound way forward for the
> > > > > Media controller framework, but because they partially worked around issues
> > > > > in a device being in use while it was removed.
> > > > > 
> > > > > They never were a complete fix for these problems nor I do think they could
> > > > > be extended to be such. There were also unaddressed issues in these patches
> > > > > pointed out during the review. For these reasons I'm reverting the three
> > > > > patches. In more detail:
> > > > > 
> > > > > * media: fix media devnode ioctl/syscall and unregister race
> > > > >   6f0dd24a084a
> > > > > 
> > > > > The patch clears the registered bit before performing the steps related to
> > > > > unregistering a media device, but the bit is checked only at the beginning
> > > > > of the IOCTL call. As unregistering a device and an IOCTL call on a file
> > > > > handle of that device are not serialised, nothing guarantees the IOCTL call
> > > > > will finish with the registered bit still in the same state. Serialising the
> > > > > two e.g. by using a mutex is hardly a feasible solution for this.
> > > > > 
> > > > > I may have pointed out the original problem but this is not the solution.
> > > > > 
> > > > > <URL:http://www.spinics.net/lists/linux-media/msg101295.html>
> > > > > 
> > > > > The right solution is instead to make sure the data structures related to
> > > > > the media device will not disappear while the IOCTL call is in progress (at
> > > > > least).    
> > > > 
> > > > They won't. Device core won't call dev.release() while an ioctl doesn't
> > > > finish. So, the struct device and struct devnode will exist while the
> > > > ioctl (or any other fops) is handled.    
> > > 
> > > I believe you're right when it comes to drivers using video devices without
> > > Media controller. However the Media devices and V4L2 sub-device nodes are
> > > another matter as well as the drivers The drivers need to be able to rely on
> > > the frameworks to support them. On MC the driver simply has no way to
> > > release the media device at the right time. The same applies to V4L2
> > > sub-devices --- something that could be added to the patchset.  
> > 
> > Huh? What's the sense of removing /dev/media0 and their associated
> > struct device before releasing the media graph?
> > 
> > The problem here is exactly the same as *any* other character device:
> > you need *first* to stop using whatever data struct is needed for
> > controlling /dev/media device and *then* removing /dev/media and
> > freeing their data structures, including struct device.  
> 
> I don't disagree about that particular point.
> 
> (Please see the beginning of the message as well.)
> 
> >   
> > > >     
> > > > > * media: fix use-after-free in cdev_put() when app exits after driver unbind
> > > > >   5b28dde51d0c
> > > > > 
> > > > > The patch avoids the problem of deleting a character device (cdev_del())
> > > > > after its memory has been released. The change is sound as such but the
> > > > > problem is addressed by another, a lot more simple patch in my series:
> > > > > 
> > > > > <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=26fa8c1a3df5859d34cef8ef953e3a29a432a17b>    
> > > > 
> > > > Your approach is not clean, as it is based on a cdev's hack of doing:
> > > > 
> > > > 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
> > > > 
> > > > That is an ugly hack, as it touches inside cdev's internal stuff,
> > > > to do something that the driver's core doesn't expect. This is the
> > > > kind of patch that could cause messy errors, by cheating with the
> > > > cdev's internal refcount checking.
> > > > 
> > > > Btw, your approach require changes on *all* drivers, in order to make
> > > > device release work, with is a way more complex than changing just the
> > > > core. as the current approach. 
> > > >     
> > > > > It might be possible to reasonably continue from here if the next patch to
> > > > > be reverted did not depend on this one.
> > > > > 
> > > > > * media-device: dynamically allocate struct media_devnode
> > > > > 
> > > > > This creates a two-way dependency between struct media_devnode and
> > > > > media_device. This is very much against the original design which clearly
> > > > > separates the two: media_devnode is entirely independent of media_device.    
> > > > 
> > > > Those structs are still independent.
> > > >     
> > > > > The original intent was that another sub-system in the kernel such as the
> > > > > V4L2 could make use of media_devnode as well and while that hasn't happened,
> > > > > perhaps the two could be merged. There simply are no other reasons to keep
> > > > > the two structs separate.
> > > > > 
> > > > > The patch is certainly a workaround, as it (partially, again) works around
> > > > > issues in timing of releasing memory and accessing it.
> > > > > 
> > > > > The proper solutions regarding the media_device and media_devnode are either
> > > > > maintain the separation or unify the two, and this patch does nor suggests
> > > > > either of these. To the contrary: it makes either of these impossible by
> > > > > design, and this reason alone is enough to revert it.
> > > > > 
> > > > > The set I'm pushing maintains the separation and leaves the option of either
> > > > > merging the two (media_device and media_devnode) or making use of
> > > > > media_devnode elsewhere open.    
> > > > 
> > > > As mentioned before, being based on a hack doesn't make it nice
> > > > for upstream merging.
> > > > 
> > > > The current approach uses the recommended way: the structure with
> > > > cdev embedded should be dynamically allocated. Well, we could merge
> > > > media_device and media_devnode, but, in this case, we'll need to
> > > > not embed media_device, in order to avoid hacks like the above.    
> > > 
> > > The current approach is simply not enough, be cdev allocated separately from
> > > media_devnode or not: the drivers have no way properly release memory
> > > related to the media devices nor the v4l2 sub-devices. That memory will get
> > > accessed through IOCTL calls: simply checking that a device was registered
> > > at one point does not mean it continues to be registered in another point of
> > > time in the future, unless the two operations are serialised in a way or
> > > another.  
> > 
> > Huh? The current approach relies on kref.
> >   
> > > >     
> > > > > > - change all drivers as the kAPI changes;      
> > > > > 
> > > > > The patchset actually adds new APIs rather than changing the OLD one --- as
> > > > > the old one was simply that drivers were responsible for allocating the data
> > > > > structures related to a media device. Existing drivers should continue to
> > > > > work as they did before without changes.    
> > > > 
> > > > Are you sure? Did you try the tests we did with binding/unbind, device
> > > > removal/insert and probe/remove of em28xx with your patches applied?    
> > > 
> > > I haven't tested that but as a matter of fact, I think I indeed have such
> > > device so I could test it. Changes on the DVB side would be needed as well
> > > in order to benefit from the API for allocating the media device.
> > >   
> > > > 
> > > > With that regards, you should really test it on an USB device, with
> > > > is hot-pluggable. There, you'll see a lot more memory lifetime issues
> > > > than on omap3.    
> > > 
> > > I'm not so sure about USB devices: unbinding works the same way whether the
> > > device is actually hot-pluggable. Still testing with different kind of
> > > devices definitely does help to root out issues, that's for sure.
> > >   
> > > >     
> > > > > Naturally, to get full benetifs of the changes, driver changes will be also
> > > > > required (see the beginning of the message).    
> > > > 
> > > > The test cases we did works on em28xx. If, after each patch of this series,
> > > > a regression happens, you need to address. I suspect that, even applying
> > > > the entire series, there will still be regressions, as I don't see any
> > > > changes to em28xx on this patch series.    
> > > 
> > > That's true, I've only changed the omap3isp driver so far as I wanted to get
> > > feedback on the framework changes.
> > >   
> > > >     
> > > > > The set has been posted as RFC in order to get reviews. It makes no sense to
> > > > > convert all the drivers and then start changing APIs, affecting all those
> > > > > converted drivers.    
> > > > 
> > > > Well, while it is not complete and still cause regressions, It can't be
> > > > considered ready for upstream review.
> > > >     
> > > > > > 
> > > > > > - be git bisectable, e. g. all patches should compile and run fine
> > > > > >   after each single patch, without introducing regressions.      
> > > > > 
> > > > > Compilation has already been tested (on ARM) on each patch applied in order.    
> > > > 
> > > > Good, but the best is to test it also on x86. Please notice that
> > > > just compiling doesn't ensure that it doesn't introduce regressions.
> > > > 
> > > > You should do your best to avoid regressions on every single patch
> > > > on your patch series.    
> > > 
> > > Certainly. Other than that, there would be fewer patches than there is
> > > now...
> > >   
> > > >     
> > > > > > 
> > > > > > That probably means that the series should be tested not only on
> > > > > > omap3, but also on some other device drivers.      
> > > > > 
> > > > > I fully agree with that. More review, testing and changes to at least some
> > > > > drivers (mostly for removable devices) will be needed before merging them,
> > > > > that's for sure.    
> > > > 
> > > > Good! One more point we agree :-)    
> > > 
> > > That's progress. It's a good start but we need more than that.
> > >   
> > 
> > Thanks,
> > Mauro
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-media" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html  
> 



Thanks,
Mauro


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-13 12:24                     ` Mauro Carvalho Chehab
@ 2016-12-13 22:23                       ` Shuah Khan
  2016-12-15 10:39                         ` Laurent Pinchart
  2016-12-15 11:30                       ` Sakari Ailus
  1 sibling, 1 reply; 89+ messages in thread
From: Shuah Khan @ 2016-12-13 22:23 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Sakari Ailus
  Cc: Sakari Ailus, linux-media, hverkuil, laurent.pinchart, Shuah Khan

Hi Sakari and Mauro,


On 12/13/2016 05:24 AM, Mauro Carvalho Chehab wrote:
> Em Tue, 13 Dec 2016 12:53:05 +0200
> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> 
>> Hi Mauro,
>>
>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>> Hi Sakari,
>>>
>>> I answered you point to point below, but I suspect that you missed how the 
>>> current approach works. So, I decided to write a quick summary here.
>>>
>>> The character devices /dev/media? are created via cdev, with relies on a 
>>> kobject per device, with has an embedded struct kref inside.
>>>
>>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>>> struct device, when the code does:
>>> 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>>
>>> before calling cdev_add().
>>>
>>> The current lifetime management is actually based on cdev's kobject's
>>> refcount, provided by its embedded kref.
>>>
>>> The kref warrants that any data associated with /dev/media0 won't be 
>>> freed if there are any pending system call. In other words, when 
>>> cdev_del() is called, it will remove /dev/media0 from the filesystem, and
>>> will call kobject_put(). 
>>>
>>> If the refcount is zero, it will call devnode->dev.release(). If the 
>>> kobject refcount is not zero, the data won't be freed.
>>>
>>> So, in the best case scenario, there's no opened file descriptors
>>> by the time media device node is unregistered. So, it will free
>>> everything.
>>>
>>> In the worse case scenario, e. g. when the driver is removed or 
>>> unbind while /dev/media0 has some opened file descriptor(s),
>>> the cdev logic will do the proper lifetime management.
>>>
>>> On such case, /dev/media0 disappears from the file system, so another open
>>> is not possible anymore. The data structures will remain allocated until
>>> all associated file descriptors are not closed.
>>>
>>> When all file descriptors are closed, the data will be freed.
>>>
>>> On that time, it will call an optional dev.release() callback,
>>> responsible to free any other data struct that the driver allocated.  
>>
>> The patchset does not change this. It's not a question of the media_devnode
>> struct either. That's not an issue.
>>
>> The issue is rather what else can be accessed through the media device and
>> other interfaces. As IOCTLs are not serialised with device removal (which
>> now releases much of the data structures) 
> 
> Huh? ioctls are serialized with struct device removal. The Driver core
> warrants that.
> 
>> there's a high chance of accessing
>> released memory (or mutexes that have been already destroyed). An example of
>> that is here, stopping a running pipeline after unbinding the device. What
>> happens there is that the media device is released whilst it's in use
>> through the video device.
>>
>> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
> 
> It is not clear from the logs what the driver tried to do, but
> that sounds like a driver's bug, with was not prepared to properly
> handle unbinds.
> 
> The problem here is that isp_video_release() is called by V4L2
> release logic, and not by the MC one:
> 
> static const struct v4l2_file_operations isp_video_fops = {
> 	.owner		= THIS_MODULE,
> 	.open		= isp_video_open,
> 	.release	= isp_video_release,
> 	.poll		= vb2_fop_poll,
> 	.unlocked_ioctl	= video_ioctl2,
> 	.mmap		= vb2_fop_mmap,
> };
> 
> It seems that the driver's logic allows it to be called before or
> after destroying the MC.

Right isp_video_release() will definitely be called after driver is
gone which means media device is gone and the device itself.

Both au0828 and em28xx have these release handlers. Neither one uses
devm resource for their device structs.

Also, both em28xx and au0828 keep disconnected state and have logic
to detect the state of the driver and device. em28xx holds reference
to v4l2->ref and releases the reference in em28xx_v4l2_close() which is
its v4l2_file_operations .release handler. It also makes sure to not
touch device hardware if device is disconnected.

Also, media graph access is done only when it has a valid media_device.
au0828 allocates media_device struct and it gets free'd when it does
its unregister sequence. Subsequent calls will check if it is null.
It also does checks to see if media_device is registered or not in
some cases.

isp_video_release() isn't safe to be called after isp device is gone,
leave alone media_device. Since isp is a devm resource, it is long
gone when device_release() release managed resources.

I agree with Mauro that this is a driver problem. Mauro and I did lot
of work to get the USB drivers (em28xx and au0828) to handle disconnect
and unbind cases even before the media controller support was added to
them.

I think what needs to happen is:

1. Remove devm use from omap3
2. Make sure media graph isn't accessed after media_device is unregistered
3. Take reference to v4l2 device to be able to make sanity checks from
   isp_video_release() to determine if media_device is still around and
   then do stop stream etc. It has to keep state.

I agree with Mauro that this is a driver problem. Mauro and I did lot
of work to get the USB drivers (em28xx and au0828) to handle disconnect
and unbind cases even before the media controller support was added to
them.

Please don't pursue this RFC series that makes mc-core changes until
ompa3 driver problems are addressed. There is no need to change the
core unless it is necessary.

I would be happy to help, unfortunately I don't have a omap3 device
to fix and test problems. I am unable to find any omap3 devices. The
one I have isn't good.

thanks,
-- Shuah


> 
> Assuming that, if the OMAP3 driver is not used it works,
> it means that, if the isp_video_release() is called
> first, no errors will happen, but if MC is destroyed before
> V4L2 call to its .release() callback, as there's no logic at the
> driver that would detect it, isp_video_release() will be calling
> isp_video_streamoff(), with depends on the MC to work.
> 
> On a first glance, I can see two ways of fixing it:
> 
> 1) to increment devnode's device kobject refcount at OMAP3 .probe(), 
> decrementing it only at isp_video_release(). That will ensure that
> MC will only be removed after V4L2 removal.
> 
> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
> inside the MC .release() callback. 
> 
> That could be done by overwriting the dev.release() callback at
> omap3 driver, as I discussed on my past e-mails, and flagging the
> driver that it should not accept streamon anymore, as the hardware
> is being disconnecting.
> 
> Btw, that explains a lot why Shuah can't reproduce the stuff you're
> complaining on her USB hardware.
> 
> The USB subsystem has a a .disconnect() callback that notifies
> the drivers that a device was unbound (likely physically removed).
> The way USB media drivers handle it is by returning -ENODEV to any
> V4L2 call that would try to touch at the hardware after unbound.
> 
> So, on au0828, there's no need to add any extra release logic.
> 
>> <URL:http://www.spinics.net/lists/linux-media/msg108943.html>
>>
>>>
>>> Em Mon, 28 Nov 2016 12:45:56 +0200
>>> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
>>>   
>>>> Hi Mauro,
>>>>
>>>> On Tue, Nov 22, 2016 at 03:44:29PM -0200, Mauro Carvalho Chehab wrote:  
>>>>> Em Mon, 14 Nov 2016 15:27:22 +0200
>>>>> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
>>>>>     
>>>>>> Hi Mauro,
>>>>>>
>>>>>> I'm replying below but let me first summarise the remaining problem area
>>>>>> that this patchset addresses.    
>>>>>
>>>>> Sorry for answering too late. Somehow, I missed this email in the cloud.
>>>>>     
>>>>>> The problems you and Shuah have seen and partially addressed are related to
>>>>>> a larger picture which is the lifetime of (mostly) memory resources related
>>>>>> to various objects used by as well both the Media controller and V4L2
>>>>>> frameworks (including videobuf2) as the drivers which make use of these
>>>>>> frameworks.
>>>>>>
>>>>>> The Media controller and V4L2 interfaces exposed by drivers consist of
>>>>>> multiple devices nodes, data structures with interdependencies within the
>>>>>> frameworks themselves and dependencies from the driver's own data structures
>>>>>> towards the framework data structures. The Media device and the media graph
>>>>>> objects are central to the problem area as well.
>>>>>>
>>>>>> So what are the issues then? Until now, we've attempted to regulate the
>>>>>> users' ability to access the devices at the time they're being unregistered
>>>>>> (and the associated memory released), but that approach does not really
>>>>>> scale: you have to make sure that the unregistering also will not take place
>>>>>> _during_ the system call --- not just in the beginning of it.
>>>>>>
>>>>>> The media graph contains media graph objects, some of which are media
>>>>>> entities (contained in struct video_device or struct v4l2_subdev, for
>>>>>> instance). Media entities as graph nodes have links to other entities. In
>>>>>> order to implement the system calls, the drivers do parse this graph in
>>>>>> order to obtain information they need to obtain from it. For instance, it's
>>>>>> not uncommon for an implementation for video node format enumeration to
>>>>>> figure out which sub-device the link from that video nodes leads to. Drivers
>>>>>> may also have similar paths they follow.
>>>>>>
>>>>>> Interrupt handling may also be taking place during the device removal during
>>>>>> which a number of data structures are now freed. This really does call for a
>>>>>> solution based on reference counting.
>>>>>>
>>>>>> This leads to the conclusion that all the memory resources that could be
>>>>>> accessed by the drivers or the kernel frameworks must stay intact until the
>>>>>> last file handle to the said devices is closed. Otherwise, there is a
>>>>>> possibility of accessing released memory.    
>>>>>
>>>>> So far, we're aligned.
>>>>>     
>>>>>> Right now in a lot of the cases, such as for video device and sub-device
>>>>>> nodes, we do release the memory when a device (as in struct device) is being
>>>>>> unregistered. There simply is in the current mainline kernel a way to do
>>>>>> this in a safe way.    
>>>>>     
>>>>>> Drivers do use devm_() family of functions to allocate
>>>>>> the memory of the media graph object and their internal data structures.    
>>>>>
>>>>> Removing devm_() from those drivers seem to be the first thing to do,
>>>>> and it is independent from any MC rework.
>>>>>
>>>>> As you'll see below, we have different opinions on other matters,
>>>>> so, my suggestion about how to proceed is that you should submit
>>>>> first the things we're aligned.
>>>>>
>>>>> In other words, please submit the patches that get rid of devm_()
>>>>> first. Then, we can address the remaining stuff.    
>>>>
>>>> Removing devm_*() is needed, but when should the memory be released then?
>>>> There's no callback currently from the media device the driver could use.  
>>>
>>> It should be easy to add a release callback if you need. Yet, I think you
>>> don't need a callback for that. Instead, you could just use the already
>>> existing one at struct device, e. g. export media_devnode_release() and,
>>> on drivers that need to release additional data, you would be doing something
>>> like:
>>>
>>> 	static void my_devnode_release(struct device *cd)
>>> 	{
>>> 		// Some code that would release things before kfree(dev)
>>> 		kthread_stop(foo_thread);
>>> 		free(foo);
>>>
>>> 		// will internally do a kfree(dev)
>>> 		media_devnode_release(cd);
>>>
>>> 		// Some code that would release things after kfree(dev)
>>> 		free(bar);
>>> 	}  
>>
>> I think we really want to make correct implementations easy for drivers, not
>> requiring e.g. to use the media_devnode interface directly. As device
>> removal isn't serialised with IOCTLs, every driver should do this in order
>> to prevent device driver's / framework IOCTL handlers operating on released
>> memory.
> 
> Well, it would be easy to add a callback at that media_devnode_release()
> would call on drivers that would need it.
> 
> As I said before, USB drivers don't need anything extra at devnode
> release. I'd say more, even PCI drivers won't likely need it, as they
> don't use MC to do things like streamoff.
> 
> I suspect that such special .release() logic is only needed on drivers
> that don't work without MC, e. g. subdev-based ones.
> 
>>
>>>
>>> And set the new release callback after registering the media device with:
>>>
>>> 	media_device_register(...);
>>> 	devnode->dev.release = my_devnode_release;
>>>
>>> The advantage of such approach is that it allows to control the order
>>> where things will be freed/released.  
>>
>> That's among the things the patchset does, but I think in a much nicer way.
> 
> A /21 patch series that break release on all drivers but OMAP3 doesn't seem
> to be a cleaner/nicer approach.
> 
>>>> OTOH devm_*() interfaces are very convenient to use, it's a lot of extra
>>>> work for drivers to handle releasing all the resources. It'd be great to
>>>> find another object where to bind those resources. Still, device_release()
>>>> does first release devres resources and then calls the release() callback,
>>>> which obviously makes the setup problematic to begin with.  
>>>
>>> Shuah's approach is providing another way to bind things. Yet, maybe
>>> it could still be possible to use devm_*(), if it has a way to
>>> control when devm will free their resources. I suspect that, if you call
>>> devm_free() during dev.release() callback, or if you use the same struct
>>> device that is associated with the cdev, devm will work.  
>>
>> I wonder if we could use the media_devnode cdev's struct device to bind this
>> stuff to. It'd be gone when there's a certainty it'll no longer be needed.
> 
> Maybe, but the real problem here is that some data are associated to
> MC, while others are associated with V4L2. 
> 
> If you can identify what data is associated to MC, and provide a way
> to handle MC ".disconnect()" so that V4L2 won't be trying to use the
> MC-related data, then it would be safe to use devm to allocate memory.
> 
>> The caveat is the release callback is called after the devres resources have
>> been released. So if a driver requires also the release callback, then it
>> has no longer access to memory allocated using devm_*() functions. I'd like
>> to have Laurent's opinion on this.
>>
>> This solution is no longer enough when we have media devices where you can
>> remove entities, as those would only be released when the entire device is
>> gone. Or, there's a memory leak until removal of the media device. I don't
>> like that albeit there might be still very few practical problems.
> 
> Agreed.
> 
>>
>>>   
>>>>>     
>>>>>>
>>>>>> With this patchset:
>>>>>>
>>>>>> - The media_device which again contains the media_devnode is allocated
>>>>>>   dynamically. The lifetime of the media device --- and the media graph
>>>>>>   objects it contains --- is bound to device nodes that are bound to the
>>>>>>   media device (video and sub-device nodes) as well as open file handles.    
>>>>>
>>>>> No. Data structures with cdev embedded into them have their lifetime
>>>>> controlled by the driver's core, and are destroyed only when there's
>>>>> no pending fops. The current approach uses device's core dev.release()    
>>>>
>>>> Fair enough; that part is indeed handled towards the user space as far as I
>>>> can tell. However that's still not enough: the media graph contains the
>>>> graph objects, and the media device that holds the graph, must outlive the
>>>> graph objects themselves.  
>>
>> I meant to say that the media device, media graph and media graph objects
>> must stay around as long as they may be accessed from the user space. For
>> instance, the user may have a file handle opened from a video device, and the
>> media graph may be accessed through that file handle on media controller
>> enabled drivers. That's just one example.
>>
>> This is what happens if you stop streaming in a pipeline after unbinding the
>> driver implementing the media device (same log as above):
>>
>> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
> 
> As I said, if this is a requirement, you could increase kobject's 
> reference. If not, you could just stop streaming when the media device
> has gone.
> 
>>
>>>
>>> Sorry, didn't follow you here. What's the sense of not freeing the media
>>> graph before destroying the struct device associated with /dev/media0? 
>>> In other words, what should outlive after chardev's data is freed?
>>>
>>> Please notice that the driver's core kobject kref ensures that the device
>>> release code is called only after all file descriptors are closed, and
>>> no other syscall would affect the cdev.
>>>   
>>>> Also removing entities doesn't really work currently: touching an entity, a
>>>> link or any kind of a graph object is not guaranteed to work unless you hold
>>>> the media graph lock. And that's simply unfeasible.  
>>>
>>> Sorry, again, didn't follow you here. The current strategy for adding
>>> and removing things at the graph relies on a lock, with serializes access
>>> to the graph, in order to avoid races if someone is trying to navigate on
>>> the graph while an object is being inserted or removed.  
>>
>> The graph mutex is taken during graph walk but nothing guarantees that, say,
>> an entity that was obtained during the graph walk will stay around once the
>> graph mutex is released.
>>
>> An alternative would be to add refcounts to entities. That'd allow removing
>> graph objects safely during the media device lifetime. The streaming would
>> certainly need to be stopped first though.
> 
> My first MC patch series added refcounts to all graph objects[1] ;)
> 
> [1] https://patchwork.linuxtv.org/patch/30766/
> 
> My original idea were to increment entities kref when links were
> created. This way, it would be easier to cleanup stuff, as they
> could be destroyed in any order, specially with dynamic entity
> creation/removal. Also, graph traversal could increment link's 
> krefs to avoid them to be destroyed while navigating on them,
> without needing to keep media lock hold for a long time.
> 
> I removed it on the second version because Laurent was unable to see
> any usage for that, but, IMHO, if properly implemented, it would
> help to support dynamic entities removal/insertion.
> 
> So, I don't mind if someone would send a patch adding it again.
> 
>>> It could be converted into a lockless approach (for example, using RCU),
>>> but this is a separate issue.  
>>
>> V4L2 sub-devices, besides an entity, may contain a device node as well. The
>> data structures span multiple drivers and may span multiple sub-systems
>> (think of ALSA) as well. The media entity is embedded in a sub-device data
>> structure allocated by drivers. Drivers, also other drivers that walk the
>> media graph, do make use of this knowledge to obtain sub-devices and access
>> controls in them.
> 
> Well, a kref-based approach would avoid locks most of the time.
> Perhaps it could be combined with RCS.
> 
>>
>>>
>>> The removal code needs to use whatever lock (or lockless) schema we
>>> use to serialize the access to the graph.
>>>   
>>>> Just look at what the
>>>> drivers do with entities: they use the v4l2_subdev interface and the control
>>>> framework to access them.
>>>>
>>>> These data structures contain struct media_entity in them, and that entity
>>>> is part of the media graph. Other drivers use entities e.g. to obtain
>>>> control values from them. References should be used to prevent releasing the
>>>> memory.  
>>>
>>> References are used by the driver's core, using kobject_get() and
>>> kobject_put(). That warrants that dev.release() will only be called
>>> when nobody is using it anymore.  
>>
>> Yes, but this does not reach entities. Their lifetime is not related to
>> that.
> 
> No. That's why, currently, we need to lock before adding/removing
> graph objects.
> 
>>>   
>>>> media_entity_get() and media_entity_put() do not do what you'd expect.  
>>>
>>> Please elaborate.  
>>
>> The functions simply get / put the module that owns the media entity. The
>> entities as such are not refcounted, and acquiring the driver's module does
>> not guarantee the entities aren't released.
> 
> Yes.
> 
>>>   
>>>> v4l2_subdev_call() should also verify that a sub-device is registered, and
>>>> make sure it will stay that way for the duration of call: the driver must be
>>>> able to expect the entity is accessible as the driver registered it.  
>>>
>>> Yes, but I can't see how this is related to this discussion. Before
>>> unregistering struct device, you need to unbind the subdevs.
>>>
>>> The only case I can see of calling v4l2_subdev_call() after all file
>>> descriptors are closed is if you have some kthread running. You need to 
>>> call kthread_stop() for such kthreads before freeing struct device.
>>>
>>> You could do it at a my_devnode_release() if you need the kthread running
>>> even after closing all file descriptors, or even before that, before
>>> calling media_device_unregister().
>>>   
>>>> The same goes for the control framework.  
>>>
>>> I don't think we have kthreads for controls. The control routines
>>> are called only when a file descriptor is opened. So, I don't see
>>> any possible issue with the control framework.  
>>
>> This isn't about kthreads; other drivers do this as well through
>> user-initiated actions. Such as starting or stopping streaming.
> 
> As explaining before, either streamoff should happen during MC removal
> or you need to increment kobject refcount to serialize the removal
> order.
> 
>>>   
>>>> As far as I remember, we somehow assumed that just acquiring the related
>>>> kernel modules would be enough to counter this but it is not.  
>>>
>>> Well, if not, you could use kobject_get() and kobject_put() to increment
>>> or decrement the cdev's refcount. Yet, I suspect that, if the drivers are 
>>> properly designed, you won't need to manually touch at the kref.  
>>
>> Entities are not refcounted. You can't get a kobject as there's none to get.
> 
> Entity removal is protected via mutex.
> 
>>>   
>>>>
>>>> I would prefer to postpone this however, the patchset already does enough
>>>> for a single patchset. Fixing this properly would likely require wait/wound
>>>> mutexes for individual entities.
>>>>   
>>>>> callback to release memory.
>>>>>
>>>>> In other words, dev.release() is only called after the driver's base
>>>>> knows that the cdev is not in use anymore. So, no ioctl() or any
>>>>> other syscalls on that point.
>>>>>
>>>>> Ok, nothing prevents some driver to do the wrong thing, keeping a
>>>>> copy of struct device and using it after free, for example storing
>>>>> it on a devm alocated memory, and printing some debug message
>>>>> after struct device is freed, but this is a driver's bug.
>>>>>
>>>>> What really worries me on this series is that it seemed that you 
>>>>> didn't understood how the current approach works. So, you decided
>>>>> to just revert it and start from scratch. This is dangerous, as
>>>>> it could cause problems to other scenarios than yours.    
>>>>
>>>> I'm not quite sure what do you mean.
>>>>
>>>> It may well be that the patchset will require changes but that's precisely
>>>> the reason why patches are reviewed before merging.  
>>>
>>> From your comments and from your code, you didn't seem to realize that
>>> the current approach relies at the struct device refcount. See above.  
>>
>> That refcount is only for struct media_devnode. It's simply not enough, as
>> I've elaborated:
>>
>> - No serialisation between IOCTL and releasing media device memory.
>>
>> 	- This causes that once the IOCTL call has begun, media device may
>> 	  be released, and this released memory can be accessed by the IOCTL
>> 	  handler.
>>
>> - Drivers and frameworks that access the media device through other device
>>   nodes such as V4L2 devices will also access released memory.
>>
>> There could be others.
> 
> See above.
> 
> 
>>
>>>   
>>>>>     
>>>>>> - Care is taken that the unregistration process and releasing memory happens
>>>>>>   in the right order. This was not always the case previously.    
>>>>>
>>>>> Freeing memory for struct media_devnode, struct device and struct cdev 
>>>>> is currently handled by the driver's core, when it known to be safe,
>>>>> and using the same logic that other subsystems do.    
>>>>
>>>> That's simply not the case. Other sub-systems do not have graphs managed by
>>>> multiple device drivers for multiple physical devices that expose device
>>>> nodes through which all of those devices can be accessed. The problem domain
>>>> is far more complex than if you had a single physical device for which a
>>>> driver would expose a device node or two to the user space.  
>>>
>>> No. The current approach uses the struct device associated with /dev/media0,
>>> created via cdev, to provide a refcount for the data associated with the
>>> character device.
>>>
>>> The struct device kobject refcount ensures that everything associated
>>> with it will only be freed after the refcount goes to zero.
>>>
>>> As I said before, if are there any cases where the refcount is going
>>> early to zero, it is just a matter of adding a few kobject_get() and
>>> kobject_put() to ensure that this won't happen early, if the driver is
>>> so broken that it is unable to do the right refcount.  
>>
>> That's correct, but it only applies to struct media_devnode. Nothing else.
>> Please see above.
> 
> Well, don't remove entities before stop using them.
> 
>>>>> We might do it different, but we need a strong reason to do it, as
>>>>> going away from the usual practice is dangerous.    
>>>>
>>>> I think we already did that when we merged the original Media controller and
>>>> V4L2 sub-device patches...
>>>>   
>>>>>     
>>>>>> - The driver remains responsible for the memory of the video and sub-device
>>>>>>   nodes. However, now the Media controller provides a convenient callback to
>>>>>>   the driver to release any memory resources when the time has come to do
>>>>>>   so. This takes place just before the media device memory is released.    
>>>>>
>>>>> Drivers could use devnode->dev.release for that. Of course, if they
>>>>> override it, they should be calling media_devnode_release() on their
>>>>> internal release functions.    
>>>>
>>>> That'd be really hackish. The drivers currently don't deal with
>>>> media_devnode directly now and I don't think they should be obliged to.  
>>>
>>> I'm not against adding a callback instead. However, that makes it lose
>>> flexibility, as the callback will either be called before of after
>>> freeing struct device.
>>>
>>> By overriding the dev.release callback, we have a finer control.
>>>
>>> If you don't see any case where we'll be freeing data after freeing
>>> struct device, then a callback would work.
>>>   
>>>>>     
>>>>>> - Drivers that do not strictly need to be removable require no changes. The
>>>>>>   benefits of this set become tangible for any driver by changing how the
>>>>>>   driver allocates memory for the data structures. Ideally at least
>>>>>>   drivers for hot-removable devices should be converted.    
>>>>>
>>>>> Drivers should allow device removal and/or driver removal. If you're
>>>>> doing any change here, you need to touch *all* drivers to use the new 
>>>>> way.    
>>>>
>>>> Let's first agree on what needs to be fixed and how, and then think about
>>>> converting the drivers. Buggy code has a tendency to continue to be buggy
>>>> unless it is fixed (or replaced).  
>>>
>>> True, but as I said, this series create buggy code when it ignored what
>>> was fixed already. Also, a patch series to be considered ready for
>>> upstream need to do the needed changes on all drivers it affects.
>>>   
>>>>>> In order to make the current drivers to behave well it is necessary to make
>>>>>> changes to how memory is allocated in the drivers. If you look at the sample
>>>>>> patches that are part of the set for the omap3isp driver, you'll find that
>>>>>> around 95% of the changes are related to removing the user of devm_() family
>>>>>> of functions instead of Media controller API changes. In this regard, the
>>>>>> approach taken here requires very little if any additional overhead.    
>>>>>
>>>>> Well, send the patches that do the 95% of the changes first e. g. devm_()
>>>>> removal, and check if you aren't using any dev_foo() printk after
>>>>> unregister, and send such patch series, without RFC. Then test what's
>>>>> still broken, if any and let's discuss with your results, in a way
>>>>> that we can all reproduce the issues you may be facing on other drivers
>>>>> that don't use devm*().    
>>>>
>>>> As I said, there's currently no way to properly release these resources as
>>>> the driver won't receive a callback from media device release.  
>>>
>>> If you're so convinced that it is needed and you won't be overriding
>>> media device's struct device release callback, just add it. It should
>>> be a 3 lines patch.  
>>
>> Just the callback isn't enough. You need to get a reference to the kobject
>> when the graph components may be accessed.
> 
> Yes. Or protect it with a mutex.
> 
>> Should we add reference counts to entities, we could add functions to get
>> references to entities, and make the media device their parent. That'd be a
>> largish change but it might not affect that many drivers after all.
> 
> Adding krefs to graph objects can be handled inside the core, except for
> the drivers that implement their own graph traversal functions.
> 
>>
>>>>   
>>>>>
>>>>>     
>>>>>> On Wed, Nov 09, 2016 at 03:46:08PM -0200, Mauro Carvalho Chehab wrote:    
>>>>>>> Em Wed, 9 Nov 2016 10:00:58 -0700
>>>>>>> Shuah Khan <shuahkh@osg.samsung.com> escreveu:
>>>>>>>       
>>>>>>>>> Maybe we can get the Media Device Allocator API work in and then we can
>>>>>>>>> get your RFC series in after that. Here is what I propose:
>>>>>>>>>
>>>>>>>>> - Keep the fixes in 4.9      
>>>>>>>
>>>>>>> Fixes should always be kept. Reverting a fix is not an option.
>>>>>>> Instead, do incremental patches on the top of it.
>>>>>>>       
>>>>>>>>> - Get Media Device Allocator API patches into 4.9.        
>>>>>>>>
>>>>>>>> I meant 4.10 not 4.9
>>>>>>>>       
>>>>>>>>> - snd-usb-auido work go into 4.10      
>>>>>>>
>>>>>>> Sounds like a plan.
>>>>>>>       
>>>>>>>>> Then your RFC series could go in. I am looking at the RFC series and that
>>>>>>>>> the drivers need to change as well, so this RFC work could take longer.
>>>>>>>>> Since we have to make media_device sharable, it is necessary to have a
>>>>>>>>> global list approach Media Device Allocator API takes. So it is possible
>>>>>>>>> for your RFC series to go on top of the Media Device Allocator API.      
>>>>>>>
>>>>>>> Firstly, the RFC series should be converted into something that can
>>>>>>> be applicable upstream, e. g.:
>>>>>>>
>>>>>>> - doing the changes over the top of upstream, instead of needing to
>>>>>>>   revert patches;      
>>>>>>
>>>>>> The patches are in fact on top of the current media-tree, or were when they
>>>>>> were sent (v4).
>>>>>>
>>>>>> The reason I'm reverting patches is that the reason why these patches were
>>>>>> merged was not because they would have been a sound way forward for the
>>>>>> Media controller framework, but because they partially worked around issues
>>>>>> in a device being in use while it was removed.
>>>>>>
>>>>>> They never were a complete fix for these problems nor I do think they could
>>>>>> be extended to be such. There were also unaddressed issues in these patches
>>>>>> pointed out during the review. For these reasons I'm reverting the three
>>>>>> patches. In more detail:
>>>>>>
>>>>>> * media: fix media devnode ioctl/syscall and unregister race
>>>>>>   6f0dd24a084a
>>>>>>
>>>>>> The patch clears the registered bit before performing the steps related to
>>>>>> unregistering a media device, but the bit is checked only at the beginning
>>>>>> of the IOCTL call. As unregistering a device and an IOCTL call on a file
>>>>>> handle of that device are not serialised, nothing guarantees the IOCTL call
>>>>>> will finish with the registered bit still in the same state. Serialising the
>>>>>> two e.g. by using a mutex is hardly a feasible solution for this.
>>>>>>
>>>>>> I may have pointed out the original problem but this is not the solution.
>>>>>>
>>>>>> <URL:http://www.spinics.net/lists/linux-media/msg101295.html>
>>>>>>
>>>>>> The right solution is instead to make sure the data structures related to
>>>>>> the media device will not disappear while the IOCTL call is in progress (at
>>>>>> least).    
>>>>>
>>>>> They won't. Device core won't call dev.release() while an ioctl doesn't
>>>>> finish. So, the struct device and struct devnode will exist while the
>>>>> ioctl (or any other fops) is handled.    
>>>>
>>>> I believe you're right when it comes to drivers using video devices without
>>>> Media controller. However the Media devices and V4L2 sub-device nodes are
>>>> another matter as well as the drivers The drivers need to be able to rely on
>>>> the frameworks to support them. On MC the driver simply has no way to
>>>> release the media device at the right time. The same applies to V4L2
>>>> sub-devices --- something that could be added to the patchset.  
>>>
>>> Huh? What's the sense of removing /dev/media0 and their associated
>>> struct device before releasing the media graph?
>>>
>>> The problem here is exactly the same as *any* other character device:
>>> you need *first* to stop using whatever data struct is needed for
>>> controlling /dev/media device and *then* removing /dev/media and
>>> freeing their data structures, including struct device.  
>>
>> I don't disagree about that particular point.
>>
>> (Please see the beginning of the message as well.)
>>
>>>   
>>>>>     
>>>>>> * media: fix use-after-free in cdev_put() when app exits after driver unbind
>>>>>>   5b28dde51d0c
>>>>>>
>>>>>> The patch avoids the problem of deleting a character device (cdev_del())
>>>>>> after its memory has been released. The change is sound as such but the
>>>>>> problem is addressed by another, a lot more simple patch in my series:
>>>>>>
>>>>>> <URL:http://git.retiisi.org.uk/?p=~sailus/linux.git;a=commitdiff;h=26fa8c1a3df5859d34cef8ef953e3a29a432a17b>    
>>>>>
>>>>> Your approach is not clean, as it is based on a cdev's hack of doing:
>>>>>
>>>>> 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>>>>
>>>>> That is an ugly hack, as it touches inside cdev's internal stuff,
>>>>> to do something that the driver's core doesn't expect. This is the
>>>>> kind of patch that could cause messy errors, by cheating with the
>>>>> cdev's internal refcount checking.
>>>>>
>>>>> Btw, your approach require changes on *all* drivers, in order to make
>>>>> device release work, with is a way more complex than changing just the
>>>>> core. as the current approach. 
>>>>>     
>>>>>> It might be possible to reasonably continue from here if the next patch to
>>>>>> be reverted did not depend on this one.
>>>>>>
>>>>>> * media-device: dynamically allocate struct media_devnode
>>>>>>
>>>>>> This creates a two-way dependency between struct media_devnode and
>>>>>> media_device. This is very much against the original design which clearly
>>>>>> separates the two: media_devnode is entirely independent of media_device.    
>>>>>
>>>>> Those structs are still independent.
>>>>>     
>>>>>> The original intent was that another sub-system in the kernel such as the
>>>>>> V4L2 could make use of media_devnode as well and while that hasn't happened,
>>>>>> perhaps the two could be merged. There simply are no other reasons to keep
>>>>>> the two structs separate.
>>>>>>
>>>>>> The patch is certainly a workaround, as it (partially, again) works around
>>>>>> issues in timing of releasing memory and accessing it.
>>>>>>
>>>>>> The proper solutions regarding the media_device and media_devnode are either
>>>>>> maintain the separation or unify the two, and this patch does nor suggests
>>>>>> either of these. To the contrary: it makes either of these impossible by
>>>>>> design, and this reason alone is enough to revert it.
>>>>>>
>>>>>> The set I'm pushing maintains the separation and leaves the option of either
>>>>>> merging the two (media_device and media_devnode) or making use of
>>>>>> media_devnode elsewhere open.    
>>>>>
>>>>> As mentioned before, being based on a hack doesn't make it nice
>>>>> for upstream merging.
>>>>>
>>>>> The current approach uses the recommended way: the structure with
>>>>> cdev embedded should be dynamically allocated. Well, we could merge
>>>>> media_device and media_devnode, but, in this case, we'll need to
>>>>> not embed media_device, in order to avoid hacks like the above.    
>>>>
>>>> The current approach is simply not enough, be cdev allocated separately from
>>>> media_devnode or not: the drivers have no way properly release memory
>>>> related to the media devices nor the v4l2 sub-devices. That memory will get
>>>> accessed through IOCTL calls: simply checking that a device was registered
>>>> at one point does not mean it continues to be registered in another point of
>>>> time in the future, unless the two operations are serialised in a way or
>>>> another.  
>>>
>>> Huh? The current approach relies on kref.
>>>   
>>>>>     
>>>>>>> - change all drivers as the kAPI changes;      
>>>>>>
>>>>>> The patchset actually adds new APIs rather than changing the OLD one --- as
>>>>>> the old one was simply that drivers were responsible for allocating the data
>>>>>> structures related to a media device. Existing drivers should continue to
>>>>>> work as they did before without changes.    
>>>>>
>>>>> Are you sure? Did you try the tests we did with binding/unbind, device
>>>>> removal/insert and probe/remove of em28xx with your patches applied?    
>>>>
>>>> I haven't tested that but as a matter of fact, I think I indeed have such
>>>> device so I could test it. Changes on the DVB side would be needed as well
>>>> in order to benefit from the API for allocating the media device.
>>>>   
>>>>>
>>>>> With that regards, you should really test it on an USB device, with
>>>>> is hot-pluggable. There, you'll see a lot more memory lifetime issues
>>>>> than on omap3.    
>>>>
>>>> I'm not so sure about USB devices: unbinding works the same way whether the
>>>> device is actually hot-pluggable. Still testing with different kind of
>>>> devices definitely does help to root out issues, that's for sure.
>>>>   
>>>>>     
>>>>>> Naturally, to get full benetifs of the changes, driver changes will be also
>>>>>> required (see the beginning of the message).    
>>>>>
>>>>> The test cases we did works on em28xx. If, after each patch of this series,
>>>>> a regression happens, you need to address. I suspect that, even applying
>>>>> the entire series, there will still be regressions, as I don't see any
>>>>> changes to em28xx on this patch series.    
>>>>
>>>> That's true, I've only changed the omap3isp driver so far as I wanted to get
>>>> feedback on the framework changes.
>>>>   
>>>>>     
>>>>>> The set has been posted as RFC in order to get reviews. It makes no sense to
>>>>>> convert all the drivers and then start changing APIs, affecting all those
>>>>>> converted drivers.    
>>>>>
>>>>> Well, while it is not complete and still cause regressions, It can't be
>>>>> considered ready for upstream review.
>>>>>     
>>>>>>>
>>>>>>> - be git bisectable, e. g. all patches should compile and run fine
>>>>>>>   after each single patch, without introducing regressions.      
>>>>>>
>>>>>> Compilation has already been tested (on ARM) on each patch applied in order.    
>>>>>
>>>>> Good, but the best is to test it also on x86. Please notice that
>>>>> just compiling doesn't ensure that it doesn't introduce regressions.
>>>>>
>>>>> You should do your best to avoid regressions on every single patch
>>>>> on your patch series.    
>>>>
>>>> Certainly. Other than that, there would be fewer patches than there is
>>>> now...
>>>>   
>>>>>     
>>>>>>>
>>>>>>> That probably means that the series should be tested not only on
>>>>>>> omap3, but also on some other device drivers.      
>>>>>>
>>>>>> I fully agree with that. More review, testing and changes to at least some
>>>>>> drivers (mostly for removable devices) will be needed before merging them,
>>>>>> that's for sure.    
>>>>>
>>>>> Good! One more point we agree :-)    
>>>>
>>>> That's progress. It's a good start but we need more than that.
>>>>   
>>>
>>> Thanks,
>>> Mauro
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-media" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html  
>>
> 
> 
> 
> Thanks,
> Mauro
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-13 22:23                       ` Shuah Khan
@ 2016-12-15 10:39                         ` Laurent Pinchart
  2016-12-15 14:56                           ` Shuah Khan
  0 siblings, 1 reply; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-15 10:39 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Mauro Carvalho Chehab, Sakari Ailus, Sakari Ailus, linux-media, hverkuil

Hello,

On Tuesday 13 Dec 2016 15:23:53 Shuah Khan wrote:
> On 12/13/2016 05:24 AM, Mauro Carvalho Chehab wrote:
> > Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> >> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> >>> Hi Sakari,
> >>> 
> >>> I answered you point to point below, but I suspect that you missed how
> >>> the current approach works. So, I decided to write a quick summary here.
> >>> 
> >>> The character devices /dev/media? are created via cdev, with relies on a
> >>> kobject per device, with has an embedded struct kref inside.
> >>> 
> >>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> >>> 
> >>> struct device, when the code does:
> >>> 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
> >>> 
> >>> before calling cdev_add().
> >>> 
> >>> The current lifetime management is actually based on cdev's kobject's
> >>> refcount, provided by its embedded kref.
> >>> 
> >>> The kref warrants that any data associated with /dev/media0 won't be
> >>> freed if there are any pending system call. In other words, when
> >>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
> >>> and
> >>> will call kobject_put().
> >>> 
> >>> If the refcount is zero, it will call devnode->dev.release(). If the
> >>> kobject refcount is not zero, the data won't be freed.
> >>> 
> >>> So, in the best case scenario, there's no opened file descriptors
> >>> by the time media device node is unregistered. So, it will free
> >>> everything.
> >>> 
> >>> In the worse case scenario, e. g. when the driver is removed or
> >>> unbind while /dev/media0 has some opened file descriptor(s),
> >>> the cdev logic will do the proper lifetime management.
> >>> 
> >>> On such case, /dev/media0 disappears from the file system, so another
> >>> open
> >>> is not possible anymore. The data structures will remain allocated until
> >>> all associated file descriptors are not closed.
> >>> 
> >>> When all file descriptors are closed, the data will be freed.
> >>> 
> >>> On that time, it will call an optional dev.release() callback,
> >>> responsible to free any other data struct that the driver allocated.
> >> 
> >> The patchset does not change this. It's not a question of the
> >> media_devnode struct either. That's not an issue.
> >> 
> >> The issue is rather what else can be accessed through the media device
> >> and other interfaces. As IOCTLs are not serialised with device removal
> >> (which now releases much of the data structures)
> > 
> > Huh? ioctls are serialized with struct device removal. The Driver core
> > warrants that.

Code references please.
 
> >> there's a high chance of accessing
> >> released memory (or mutexes that have been already destroyed). An example
> >> of that is here, stopping a running pipeline after unbinding the device.
> >> What happens there is that the media device is released whilst it's in
> >> use through the video device.
> >> 
> >> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
> > 
> > It is not clear from the logs what the driver tried to do, but
> > that sounds like a driver's bug, with was not prepared to properly
> > handle unbinds.
> > 
> > The problem here is that isp_video_release() is called by V4L2
> > release logic, and not by the MC one:
> > 
> > static const struct v4l2_file_operations isp_video_fops = {
> > 	.owner		= THIS_MODULE,
> > 	.open		= isp_video_open,
> > 	.release	= isp_video_release,
> > 	.poll		= vb2_fop_poll,
> > 	.unlocked_ioctl	= video_ioctl2,
> > 	.mmap		= vb2_fop_mmap,
> > };
> > 
> > It seems that the driver's logic allows it to be called before or
> > after destroying the MC.
> 
> Right isp_video_release() will definitely be called after driver is
> gone which means media device is gone and the device itself.

Certainly not after the driver is gone (as in the module being unloaded from 
memory), but it can be called after the device is unbound from the driver, 
yes.

> Both au0828 and em28xx have these release handlers. Neither one uses
> devm resource for their device structs.

And no driver exposing objects to userspace-accessible code paths should. I've 
been pointing at how devm_kzalloc() is abused for more than a year now, it's 
nice to see that people slowly start listening.

> Also, both em28xx and au0828 keep disconnected state and have logic
> to detect the state of the driver and device. em28xx holds reference
> to v4l2->ref

That's very, very wrong. The v4l2_device::ref field must *not* be touched by 
drivers. Acquiring and releasing references to v4l2_device instances must be 
done with v4l2_device_get() and v4l2_device_put(), and the structure has a 
release handler that drivers can use. Why do people write such horrible code 
that pokes at private fields ?

> and releases the reference in em28xx_v4l2_close() which is
> its v4l2_file_operations .release handler. It also makes sure to not
> touch device hardware if device is disconnected.
> 
> Also, media graph access is done only when it has a valid media_device.
> au0828 allocates media_device struct and it gets free'd when it does
> its unregister sequence. Subsequent calls will check if it is null.

This is very wrong too. Don't try to handle data structures being pulled from 
under the driver's feet at random times. At best you will end up with races. 
Data structures must instead be properly refcounted.

> It also does checks to see if media_device is registered or not in
> some cases.
> 
> isp_video_release() isn't safe to be called after isp device is gone,
> leave alone media_device. Since isp is a devm resource, it is long
> gone when device_release() release managed resources.
> 
> I agree with Mauro that this is a driver problem.

No. There *is* a driver problem caused by devm_kzalloc() usage, and that 
problem *must* be fixed, but the media device life time management is also 
completely broken in core code.

> Mauro and I did lot of work to get the USB drivers (em28xx and au0828) to
> handle disconnect and unbind cases even before the media controller support
> was added to them.
> 
> I think what needs to happen is:
> 
> 1. Remove devm use from omap3

Absolutely.

> 2. Make sure media graph isn't accessed after media_device is unregistered

No way. We need to access the graph from the release handlers of the 
userspace-exposed structures (videodev and possibly media_device). The 
media_device structure must *not* disappear at unregistration time.

> 3. Take reference to v4l2 device to be able to make sanity checks from
>    isp_video_release() to determine if media_device is still around and
>    then do stop stream etc. It has to keep state.
> 
> I agree with Mauro that this is a driver problem. Mauro and I did lot
> of work to get the USB drivers (em28xx and au0828) to handle disconnect
> and unbind cases even before the media controller support was added to
> them.
> 
> Please don't pursue this RFC series that makes mc-core changes until
> ompa3 driver problems are addressed. There is no need to change the
> core unless it is necessary.

It is necessary as has been explained countless times, and will become more 
and more necessary as media_device instances get shared between multiple 
drivers, which is currently attempted *without* reference counting.

> I would be happy to help, unfortunately I don't have a omap3 device
> to fix and test problems. I am unable to find any omap3 devices. The
> one I have isn't good.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management
  2016-08-26 23:43 ` [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management Sakari Ailus
@ 2016-12-15 11:23   ` Laurent Pinchart
  2016-12-15 11:39     ` Sakari Ailus
  2016-12-16 13:32     ` Sakari Ailus
  0 siblings, 2 replies; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-15 11:23 UTC (permalink / raw)
  To: Sakari Ailus; +Cc: linux-media, hverkuil, mchehab, shuahkh

Hi Sakari,

Thank you for the patch.

On Saturday 27 Aug 2016 02:43:29 Sakari Ailus wrote:
> devm functions are fine for managing resources that are directly related
> to the device at hand and that have no other dependencies. However, a
> process holding a file handle to a device created by a driver for a device
> may result in the file handle left behind after the device is long gone.
> This will result in accessing released (and potentially reallocated)
> memory.
> 
> Instead, rely on the media device which will stick around until all users
> are gone.

Could you move this patch to the beginning of the series to show that 
converting the driver away from devm_* isn't enough to fix the problem that 
the series tries to address ?

> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> ---
>  drivers/media/platform/omap3isp/isp.c         | 38 +++++++++++++++++-------
>  drivers/media/platform/omap3isp/ispccp2.c     |  3 ++-
>  drivers/media/platform/omap3isp/isph3a_aewb.c | 20 +++++++++-----
>  drivers/media/platform/omap3isp/isph3a_af.c   | 20 +++++++++-----
>  drivers/media/platform/omap3isp/isphist.c     |  5 ++--
>  drivers/media/platform/omap3isp/ispstat.c     |  2 ++
>  6 files changed, 63 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/media/platform/omap3isp/isp.c
> b/drivers/media/platform/omap3isp/isp.c index 689efe8..262ddf7 100644
> --- a/drivers/media/platform/omap3isp/isp.c
> +++ b/drivers/media/platform/omap3isp/isp.c
> @@ -1370,7 +1370,7 @@ static int isp_get_clocks(struct isp_device *isp)
>  	unsigned int i;
> 
>  	for (i = 0; i < ARRAY_SIZE(isp_clocks); ++i) {
> -		clk = devm_clk_get(isp->dev, isp_clocks[i]);
> +		clk = clk_get(isp->dev, isp_clocks[i]);
>  		if (IS_ERR(clk)) {
>  			dev_err(isp->dev, "clk_get %s failed\n", 
isp_clocks[i]);
>  			return PTR_ERR(clk);
> @@ -1382,6 +1382,14 @@ static int isp_get_clocks(struct isp_device *isp)
>  	return 0;
>  }
> 
> +static void isp_put_clocks(struct isp_device *isp)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(isp_clocks); ++i)
> +		clk_put(isp->clock[i]);
> +}
> +
>  /*
>   * omap3isp_get - Acquire the ISP resource.
>   *
> @@ -1596,7 +1604,6 @@ static void isp_unregister_entities(struct isp_device
> *isp) omap3isp_stat_unregister_entities(&isp->isp_af);
>  	omap3isp_stat_unregister_entities(&isp->isp_hist);
> 
> -	v4l2_device_unregister(&isp->v4l2_dev);

This isn't correct. The v4l2_device instance should be unregistered here, to 
make sure that the subdev nodes are unregistered too. And even if moving the 
function call was correct, it should be done in a separate patch as it's 
unrelated to $SUBJECT.

>  	media_device_unregister(isp->media_dev);
>  	media_device_put(isp->media_dev);
>  }
> @@ -1951,6 +1958,8 @@ static void isp_release(struct media_device *mdev)
>  {
>  	struct isp_device *isp = media_device_priv(mdev);
> 
> +	v4l2_device_unregister(&isp->v4l2_dev);
> +
>  	isp_cleanup_modules(isp);
>  	isp_xclk_cleanup(isp);
> 
> @@ -1959,6 +1968,10 @@ static void isp_release(struct media_device *mdev)
>  	__omap3isp_put(isp, false);
> 
>  	media_entity_enum_cleanup(&isp->crashed);
> +
> +	isp_put_clocks(isp);
> +
> +	kfree(isp);
>  }
> 
>  static int isp_attach_iommu(struct isp_device *isp)
> @@ -2211,7 +2224,7 @@ static int isp_probe(struct platform_device *pdev)
>  	int ret;
>  	int i, m;
> 
> -	isp = devm_kzalloc(&pdev->dev, sizeof(*isp), GFP_KERNEL);
> +	isp = kzalloc(sizeof(*isp), GFP_KERNEL);
>  	if (!isp) {
>  		dev_err(&pdev->dev, "could not allocate memory\n");
>  		return -ENOMEM;
> @@ -2220,21 +2233,23 @@ static int isp_probe(struct platform_device *pdev)
>  	ret = of_property_read_u32(pdev->dev.of_node, "ti,phy-type",
>  				   &isp->phy_type);
>  	if (ret)
> -		return ret;
> +		goto error_release_isp;

I propose reorganizing this a bit more and moving DT parsing after the 
platform_set_drvdata() call. That way you can pass the ISP device to 
isp_of_parse_nodes() (which by the way also calls devm_* functions) and group 
the mutex_destroy() and various kfree() calls under a single error label. You 
might want to split this reorganization in a separate patch.

>  	isp->syscon = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
>  						      "syscon");
> -	if (IS_ERR(isp->syscon))
> -		return PTR_ERR(isp->syscon);
> +	if (IS_ERR(isp->syscon)) {
> +		ret = PTR_ERR(isp->syscon);
> +		goto error_release_isp;
> +	}
> 
>  	ret = of_property_read_u32_index(pdev->dev.of_node, "syscon", 1,
>  					 &isp->syscon_offset);
>  	if (ret)
> -		return ret;
> +		goto error_release_isp;
> 
>  	ret = isp_of_parse_nodes(&pdev->dev, &isp->notifier);
>  	if (ret < 0)
> -		return ret;
> +		goto error_release_isp;
> 
>  	isp->autoidle = autoidle;
> 
> @@ -2251,8 +2266,8 @@ static int isp_probe(struct platform_device *pdev)
>  	platform_set_drvdata(pdev, isp);
> 
>  	/* Regulators */
> -	isp->isp_csiphy1.vdd = devm_regulator_get(&pdev->dev, "vdd-csiphy1");
> -	isp->isp_csiphy2.vdd = devm_regulator_get(&pdev->dev, "vdd-csiphy2");
> +	isp->isp_csiphy1.vdd = regulator_get(&pdev->dev, "vdd-csiphy1");
> +	isp->isp_csiphy2.vdd = regulator_get(&pdev->dev, "vdd-csiphy2");

How about moving this to omap3isp_csiphy_init() ? You also need to release 
those regulators.

However, I wonder whether we couldn't keep devm_* for the clocks and 
regulators, as they shouldn't be touched anymore after remove() time.

>  	/* Clocks
>  	 *
> @@ -2384,6 +2399,9 @@ error_isp:
>  	__omap3isp_put(isp, false);
>  error:
>  	mutex_destroy(&isp->isp_mutex);
> +	isp_put_clocks(isp);
> +error_release_isp:
> +	kfree(isp);
> 
>  	return ret;
>  }
> diff --git a/drivers/media/platform/omap3isp/ispccp2.c
> b/drivers/media/platform/omap3isp/ispccp2.c index ca09523..d49ce8a 100644
> --- a/drivers/media/platform/omap3isp/ispccp2.c
> +++ b/drivers/media/platform/omap3isp/ispccp2.c
> @@ -1135,7 +1135,7 @@ int omap3isp_ccp2_init(struct isp_device *isp)
>  	 * TODO: Don't hardcode the usage of PHY1 (shared with CSI2c).
>  	 */
>  	if (isp->revision == ISP_REVISION_2_0) {
> -		ccp2->vdds_csib = devm_regulator_get(isp->dev, "vdds_csib");
> +		ccp2->vdds_csib = regulator_get(isp->dev, "vdds_csib");
>  		if (IS_ERR(ccp2->vdds_csib)) {
>  			dev_dbg(isp->dev,
>  				"Could not get regulator vdds_csib\n");
> @@ -1163,4 +1163,5 @@ void omap3isp_ccp2_cleanup(struct isp_device *isp)
> 
>  	omap3isp_video_cleanup(&ccp2->video_in);
>  	media_entity_cleanup(&ccp2->subdev.entity);
> +	regulator_put(ccp2->vdds_csib);
>  }
> diff --git a/drivers/media/platform/omap3isp/isph3a_aewb.c
> b/drivers/media/platform/omap3isp/isph3a_aewb.c index ccaf92f..130df8b
> 100644
> --- a/drivers/media/platform/omap3isp/isph3a_aewb.c
> +++ b/drivers/media/platform/omap3isp/isph3a_aewb.c

Please see my comments on isph3a_af.c below, they apply here too.

[snip]

> diff --git a/drivers/media/platform/omap3isp/isph3a_af.c
> b/drivers/media/platform/omap3isp/isph3a_af.c index 92937f7..7eecf97 100644
> --- a/drivers/media/platform/omap3isp/isph3a_af.c
> +++ b/drivers/media/platform/omap3isp/isph3a_af.c
> @@ -352,9 +352,10 @@ int omap3isp_h3a_af_init(struct isp_device *isp)
>  {
>  	struct ispstat *af = &isp->isp_af;
>  	struct omap3isp_h3a_af_config *af_cfg;
> -	struct omap3isp_h3a_af_config *af_recover_cfg;
> +	struct omap3isp_h3a_af_config *af_recover_cfg = NULL;
> +	int ret;
> 
> -	af_cfg = devm_kzalloc(isp->dev, sizeof(*af_cfg), GFP_KERNEL);
> +	af_cfg = kzalloc(sizeof(*af_cfg), GFP_KERNEL);
>  	if (af_cfg == NULL)
>  		return -ENOMEM;
> 
> @@ -364,12 +365,12 @@ int omap3isp_h3a_af_init(struct isp_device *isp)
>  	af->isp = isp;
> 
>  	/* Set recover state configuration */
> -	af_recover_cfg = devm_kzalloc(isp->dev, sizeof(*af_recover_cfg),
> -				      GFP_KERNEL);
> +	af_recover_cfg = kzalloc(sizeof(*af_recover_cfg), GFP_KERNEL);
>  	if (!af_recover_cfg) {
>  		dev_err(af->isp->dev, "AF: cannot allocate memory for recover 
"
>  				      "configuration.\n");
> -		return -ENOMEM;
> +		ret = -ENOMEM;
> +		goto err;
>  	}
> 
>  	af_recover_cfg->paxel.h_start = OMAP3ISP_AF_PAXEL_HZSTART_MIN;
> @@ -381,13 +382,20 @@ int omap3isp_h3a_af_init(struct isp_device *isp)
>  	if (h3a_af_validate_params(af, af_recover_cfg)) {
>  		dev_err(af->isp->dev, "AF: recover configuration is "
>  				      "invalid.\n");

Unrelated to this patch, but this shouldn't happen. I wonder whether we could 
remove the check.

> -		return -EINVAL;
> +		ret = -EINVAL;
> +		goto err;
>  	}
> 
>  	af_recover_cfg->buf_size = h3a_af_get_buf_size(af_recover_cfg);
>  	af->recover_priv = af_recover_cfg;
> 
>  	return omap3isp_stat_init(af, "AF", &h3a_af_subdev_ops);

You need to catch the omap3isp_stat_init() failures too. Something like

	ret = omap3isp_stat_init(af, "AF", &h3a_af_subdev_ops);

done:
	if (ret) {
		kfree(af_recover_cfg);
		kfree(af_cfg);
	}

	return ret;

and replacing the above goto err; with goto done; ?

> +
> +err:
> +	kfree(af_cfg);
> +	kfree(af_recover_cfg);
> +
> +	return ret;
>  }
> 
>  void omap3isp_h3a_af_cleanup(struct isp_device *isp)
> diff --git a/drivers/media/platform/omap3isp/isphist.c
> b/drivers/media/platform/omap3isp/isphist.c index 7138b04..976cab0 100644
> --- a/drivers/media/platform/omap3isp/isphist.c
> +++ b/drivers/media/platform/omap3isp/isphist.c
> @@ -477,9 +477,9 @@ int omap3isp_hist_init(struct isp_device *isp)
>  {
>  	struct ispstat *hist = &isp->isp_hist;
>  	struct omap3isp_hist_config *hist_cfg;
> -	int ret = -1;
> +	int ret;
> 
> -	hist_cfg = devm_kzalloc(isp->dev, sizeof(*hist_cfg), GFP_KERNEL);
> +	hist_cfg = kzalloc(sizeof(*hist_cfg), GFP_KERNEL);
>  	if (hist_cfg == NULL)
>  		return -ENOMEM;

There's a return in the middle of this function that should be turned into a 
goto done.

> @@ -517,6 +517,7 @@ int omap3isp_hist_init(struct isp_device *isp)

With a done label added right here.

>  	if (ret) {
>  		if (hist->dma_ch)
>  			dma_release_channel(hist->dma_ch);
> +		kfree(hist_cfg);
>  	}
> 
>  	return ret;
> diff --git a/drivers/media/platform/omap3isp/ispstat.c
> b/drivers/media/platform/omap3isp/ispstat.c index 1b9217d..1c1365f 100644
> --- a/drivers/media/platform/omap3isp/ispstat.c
> +++ b/drivers/media/platform/omap3isp/ispstat.c
> @@ -1059,4 +1059,6 @@ void omap3isp_stat_cleanup(struct ispstat *stat)
>  	mutex_destroy(&stat->ioctl_lock);
>  	isp_stat_bufs_free(stat);
>  	kfree(stat->buf);
> +	kfree(stat->priv);
> +	kfree(stat->recover_priv);
>  }

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-13 12:24                     ` Mauro Carvalho Chehab
  2016-12-13 22:23                       ` Shuah Khan
@ 2016-12-15 11:30                       ` Sakari Ailus
  2016-12-15 12:56                         ` Laurent Pinchart
  1 sibling, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2016-12-15 11:30 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shuah Khan, Sakari Ailus, linux-media, hverkuil, laurent.pinchart

Hi Mauro,

On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
> Em Tue, 13 Dec 2016 12:53:05 +0200
> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> 
> > Hi Mauro,
> > 
> > On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> > > Hi Sakari,
> > > 
> > > I answered you point to point below, but I suspect that you missed how the 
> > > current approach works. So, I decided to write a quick summary here.
> > > 
> > > The character devices /dev/media? are created via cdev, with relies on a 
> > > kobject per device, with has an embedded struct kref inside.
> > > 
> > > Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> > > struct device, when the code does:
> > > 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
> > > 
> > > before calling cdev_add().
> > > 
> > > The current lifetime management is actually based on cdev's kobject's
> > > refcount, provided by its embedded kref.
> > > 
> > > The kref warrants that any data associated with /dev/media0 won't be 
> > > freed if there are any pending system call. In other words, when 
> > > cdev_del() is called, it will remove /dev/media0 from the filesystem, and
> > > will call kobject_put(). 
> > > 
> > > If the refcount is zero, it will call devnode->dev.release(). If the 
> > > kobject refcount is not zero, the data won't be freed.
> > > 
> > > So, in the best case scenario, there's no opened file descriptors
> > > by the time media device node is unregistered. So, it will free
> > > everything.
> > > 
> > > In the worse case scenario, e. g. when the driver is removed or 
> > > unbind while /dev/media0 has some opened file descriptor(s),
> > > the cdev logic will do the proper lifetime management.
> > > 
> > > On such case, /dev/media0 disappears from the file system, so another open
> > > is not possible anymore. The data structures will remain allocated until
> > > all associated file descriptors are not closed.
> > > 
> > > When all file descriptors are closed, the data will be freed.
> > > 
> > > On that time, it will call an optional dev.release() callback,
> > > responsible to free any other data struct that the driver allocated.  
> > 
> > The patchset does not change this. It's not a question of the media_devnode
> > struct either. That's not an issue.
> > 
> > The issue is rather what else can be accessed through the media device and
> > other interfaces. As IOCTLs are not serialised with device removal (which
> > now releases much of the data structures) 
> 
> Huh? ioctls are serialized with struct device removal. The Driver core
> warrants that.

How?

As far as I can tell, there's nothing in the way of an IOCTL being in
progress on a character device which is registered by the driver for a
hardware device which is being removed.

vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
are taken during that path, which I believe is by design.

> 
> > there's a high chance of accessing
> > released memory (or mutexes that have been already destroyed). An example of
> > that is here, stopping a running pipeline after unbinding the device. What
> > happens there is that the media device is released whilst it's in use
> > through the video device.
> > 
> > <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
> 
> It is not clear from the logs what the driver tried to do, but
> that sounds like a driver's bug, with was not prepared to properly
> handle unbinds.
> 
> The problem here is that isp_video_release() is called by V4L2
> release logic, and not by the MC one:
> 
> static const struct v4l2_file_operations isp_video_fops = {
> 	.owner		= THIS_MODULE,
> 	.open		= isp_video_open,
> 	.release	= isp_video_release,
> 	.poll		= vb2_fop_poll,
> 	.unlocked_ioctl	= video_ioctl2,
> 	.mmap		= vb2_fop_mmap,
> };
> 
> It seems that the driver's logic allows it to be called before or
> after destroying the MC.
> 
> Assuming that, if the OMAP3 driver is not used it works,
> it means that, if the isp_video_release() is called
> first, no errors will happen, but if MC is destroyed before
> V4L2 call to its .release() callback, as there's no logic at the
> driver that would detect it, isp_video_release() will be calling
> isp_video_streamoff(), with depends on the MC to work.
> 
> On a first glance, I can see two ways of fixing it:
> 
> 1) to increment devnode's device kobject refcount at OMAP3 .probe(), 
> decrementing it only at isp_video_release(). That will ensure that
> MC will only be removed after V4L2 removal.
> 
> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
> inside the MC .release() callback. 

This is a fair suggestion, indeed. Let me see what could be done there.
Albeit this is just *one* of the existing issues. It will not address all
problems fixed by the patchset.

> 
> That could be done by overwriting the dev.release() callback at
> omap3 driver, as I discussed on my past e-mails, and flagging the
> driver that it should not accept streamon anymore, as the hardware
> is being disconnecting.

A mutex will be needed to serialise the this with starting streaming.

> 
> Btw, that explains a lot why Shuah can't reproduce the stuff you're
> complaining on her USB hardware.
> 
> The USB subsystem has a a .disconnect() callback that notifies
> the drivers that a device was unbound (likely physically removed).
> The way USB media drivers handle it is by returning -ENODEV to any
> V4L2 call that would try to touch at the hardware after unbound.
> 
> So, on au0828, there's no need to add any extra release logic.
> 
> > <URL:http://www.spinics.net/lists/linux-media/msg108943.html>
> > 
> > > 
> > > Em Mon, 28 Nov 2016 12:45:56 +0200
> > > Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> > >   
> > > > Hi Mauro,
> > > > 
> > > > On Tue, Nov 22, 2016 at 03:44:29PM -0200, Mauro Carvalho Chehab wrote:  
> > > > > Em Mon, 14 Nov 2016 15:27:22 +0200
> > > > > Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> > > > >     
> > > > > > Hi Mauro,
> > > > > > 
> > > > > > I'm replying below but let me first summarise the remaining problem area
> > > > > > that this patchset addresses.    
> > > > > 
> > > > > Sorry for answering too late. Somehow, I missed this email in the cloud.
> > > > >     
> > > > > > The problems you and Shuah have seen and partially addressed are related to
> > > > > > a larger picture which is the lifetime of (mostly) memory resources related
> > > > > > to various objects used by as well both the Media controller and V4L2
> > > > > > frameworks (including videobuf2) as the drivers which make use of these
> > > > > > frameworks.
> > > > > > 
> > > > > > The Media controller and V4L2 interfaces exposed by drivers consist of
> > > > > > multiple devices nodes, data structures with interdependencies within the
> > > > > > frameworks themselves and dependencies from the driver's own data structures
> > > > > > towards the framework data structures. The Media device and the media graph
> > > > > > objects are central to the problem area as well.
> > > > > > 
> > > > > > So what are the issues then? Until now, we've attempted to regulate the
> > > > > > users' ability to access the devices at the time they're being unregistered
> > > > > > (and the associated memory released), but that approach does not really
> > > > > > scale: you have to make sure that the unregistering also will not take place
> > > > > > _during_ the system call --- not just in the beginning of it.
> > > > > >
> > > > > > The media graph contains media graph objects, some of which are media
> > > > > > entities (contained in struct video_device or struct v4l2_subdev, for
> > > > > > instance). Media entities as graph nodes have links to other entities. In
> > > > > > order to implement the system calls, the drivers do parse this graph in
> > > > > > order to obtain information they need to obtain from it. For instance, it's
> > > > > > not uncommon for an implementation for video node format enumeration to
> > > > > > figure out which sub-device the link from that video nodes leads to. Drivers
> > > > > > may also have similar paths they follow.
> > > > > > 
> > > > > > Interrupt handling may also be taking place during the device removal during
> > > > > > which a number of data structures are now freed. This really does call for a
> > > > > > solution based on reference counting.
> > > > > > 
> > > > > > This leads to the conclusion that all the memory resources that could be
> > > > > > accessed by the drivers or the kernel frameworks must stay intact until the
> > > > > > last file handle to the said devices is closed. Otherwise, there is a
> > > > > > possibility of accessing released memory.    
> > > > > 
> > > > > So far, we're aligned.
> > > > >     
> > > > > > Right now in a lot of the cases, such as for video device and sub-device
> > > > > > nodes, we do release the memory when a device (as in struct device) is being
> > > > > > unregistered. There simply is in the current mainline kernel a way to do
> > > > > > this in a safe way.    
> > > > >     
> > > > > > Drivers do use devm_() family of functions to allocate
> > > > > > the memory of the media graph object and their internal data structures.    
> > > > > 
> > > > > Removing devm_() from those drivers seem to be the first thing to do,
> > > > > and it is independent from any MC rework.
> > > > > 
> > > > > As you'll see below, we have different opinions on other matters,
> > > > > so, my suggestion about how to proceed is that you should submit
> > > > > first the things we're aligned.
> > > > > 
> > > > > In other words, please submit the patches that get rid of devm_()
> > > > > first. Then, we can address the remaining stuff.    
> > > > 
> > > > Removing devm_*() is needed, but when should the memory be released then?
> > > > There's no callback currently from the media device the driver could use.  
> > > 
> > > It should be easy to add a release callback if you need. Yet, I think you
> > > don't need a callback for that. Instead, you could just use the already
> > > existing one at struct device, e. g. export media_devnode_release() and,
> > > on drivers that need to release additional data, you would be doing something
> > > like:
> > > 
> > > 	static void my_devnode_release(struct device *cd)
> > > 	{
> > > 		// Some code that would release things before kfree(dev)
> > > 		kthread_stop(foo_thread);
> > > 		free(foo);
> > > 
> > > 		// will internally do a kfree(dev)
> > > 		media_devnode_release(cd);
> > > 
> > > 		// Some code that would release things after kfree(dev)
> > > 		free(bar);
> > > 	}  
> > 
> > I think we really want to make correct implementations easy for drivers, not
> > requiring e.g. to use the media_devnode interface directly. As device
> > removal isn't serialised with IOCTLs, every driver should do this in order
> > to prevent device driver's / framework IOCTL handlers operating on released
> > memory.
> 
> Well, it would be easy to add a callback at that media_devnode_release()
> would call on drivers that would need it.
> 
> As I said before, USB drivers don't need anything extra at devnode
> release. I'd say more, even PCI drivers won't likely need it, as they
> don't use MC to do things like streamoff.
> 
> I suspect that such special .release() logic is only needed on drivers
> that don't work without MC, e. g. subdev-based ones.
> 
> > 
> > > 
> > > And set the new release callback after registering the media device with:
> > > 
> > > 	media_device_register(...);
> > > 	devnode->dev.release = my_devnode_release;
> > > 
> > > The advantage of such approach is that it allows to control the order
> > > where things will be freed/released.  
> > 
> > That's among the things the patchset does, but I think in a much nicer way.
> 
> A /21 patch series that break release on all drivers but OMAP3 doesn't seem
> to be a cleaner/nicer approach.

I've said multiple times this RFC set is proposing an MC core interface
change, and until the interface is agreed on or at least discussed, there is
no point in converting all existing drivers. It'd be just extra work for no
gain.

It's also debatable whether this breaks any other driver. It does widen the
time window during which bad stuff may happen (without driver changes) but
it does not add any new ones as far as I can tell.

> 
> > > > OTOH devm_*() interfaces are very convenient to use, it's a lot of extra
> > > > work for drivers to handle releasing all the resources. It'd be great to
> > > > find another object where to bind those resources. Still, device_release()
> > > > does first release devres resources and then calls the release() callback,
> > > > which obviously makes the setup problematic to begin with.  
> > > 
> > > Shuah's approach is providing another way to bind things. Yet, maybe
> > > it could still be possible to use devm_*(), if it has a way to
> > > control when devm will free their resources. I suspect that, if you call
> > > devm_free() during dev.release() callback, or if you use the same struct
> > > device that is associated with the cdev, devm will work.  
> > 
> > I wonder if we could use the media_devnode cdev's struct device to bind this
> > stuff to. It'd be gone when there's a certainty it'll no longer be needed.
> 
> Maybe, but the real problem here is that some data are associated to
> MC, while others are associated with V4L2. 
> 
> If you can identify what data is associated to MC, and provide a way
> to handle MC ".disconnect()" so that V4L2 won't be trying to use the
> MC-related data, then it would be safe to use devm to allocate memory.

Let's see how stopping the streaming in omap3isp remove() handler would look
like, and what else needs to be serialised with that.

> 
> > The caveat is the release callback is called after the devres resources have
> > been released. So if a driver requires also the release callback, then it
> > has no longer access to memory allocated using devm_*() functions. I'd like
> > to have Laurent's opinion on this.
> > 
> > This solution is no longer enough when we have media devices where you can
> > remove entities, as those would only be released when the entire device is
> > gone. Or, there's a memory leak until removal of the media device. I don't
> > like that albeit there might be still very few practical problems.
> 
> Agreed.
> 
> > 
> > >   
> > > > >     
> > > > > > 
> > > > > > With this patchset:
> > > > > > 
> > > > > > - The media_device which again contains the media_devnode is allocated
> > > > > >   dynamically. The lifetime of the media device --- and the media graph
> > > > > >   objects it contains --- is bound to device nodes that are bound to the
> > > > > >   media device (video and sub-device nodes) as well as open file handles.    
> > > > > 
> > > > > No. Data structures with cdev embedded into them have their lifetime
> > > > > controlled by the driver's core, and are destroyed only when there's
> > > > > no pending fops. The current approach uses device's core dev.release()    
> > > > 
> > > > Fair enough; that part is indeed handled towards the user space as far as I
> > > > can tell. However that's still not enough: the media graph contains the
> > > > graph objects, and the media device that holds the graph, must outlive the
> > > > graph objects themselves.  
> > 
> > I meant to say that the media device, media graph and media graph objects
> > must stay around as long as they may be accessed from the user space. For
> > instance, the user may have a file handle opened from a video device, and the
> > media graph may be accessed through that file handle on media controller
> > enabled drivers. That's just one example.
> > 
> > This is what happens if you stop streaming in a pipeline after unbinding the
> > driver implementing the media device (same log as above):
> > 
> > <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
> 
> As I said, if this is a requirement, you could increase kobject's 
> reference. If not, you could just stop streaming when the media device
> has gone.
> 
> > 
> > > 
> > > Sorry, didn't follow you here. What's the sense of not freeing the media
> > > graph before destroying the struct device associated with /dev/media0? 
> > > In other words, what should outlive after chardev's data is freed?
> > > 
> > > Please notice that the driver's core kobject kref ensures that the device
> > > release code is called only after all file descriptors are closed, and
> > > no other syscall would affect the cdev.
> > >   
> > > > Also removing entities doesn't really work currently: touching an entity, a
> > > > link or any kind of a graph object is not guaranteed to work unless you hold
> > > > the media graph lock. And that's simply unfeasible.  
> > > 
> > > Sorry, again, didn't follow you here. The current strategy for adding
> > > and removing things at the graph relies on a lock, with serializes access
> > > to the graph, in order to avoid races if someone is trying to navigate on
> > > the graph while an object is being inserted or removed.  
> > 
> > The graph mutex is taken during graph walk but nothing guarantees that, say,
> > an entity that was obtained during the graph walk will stay around once the
> > graph mutex is released.
> > 
> > An alternative would be to add refcounts to entities. That'd allow removing
> > graph objects safely during the media device lifetime. The streaming would
> > certainly need to be stopped first though.
> 
> My first MC patch series added refcounts to all graph objects[1] ;)
> 
> [1] https://patchwork.linuxtv.org/patch/30766/
> 
> My original idea were to increment entities kref when links were
> created. This way, it would be easier to cleanup stuff, as they
> could be destroyed in any order, specially with dynamic entity
> creation/removal. Also, graph traversal could increment link's 
> krefs to avoid them to be destroyed while navigating on them,
> without needing to keep media lock hold for a long time.
> 
> I removed it on the second version because Laurent was unable to see
> any usage for that, but, IMHO, if properly implemented, it would
> help to support dynamic entities removal/insertion.

The media graph lock will get / is getting more and more problematic over
time, but for now, entity kref could be just enough to support this. Let's
see, anyway this is out of the scope right now IMO.

> 
> So, I don't mind if someone would send a patch adding it again.
> 
> > > It could be converted into a lockless approach (for example, using RCU),
> > > but this is a separate issue.  
> > 
> > V4L2 sub-devices, besides an entity, may contain a device node as well. The
> > data structures span multiple drivers and may span multiple sub-systems
> > (think of ALSA) as well. The media entity is embedded in a sub-device data
> > structure allocated by drivers. Drivers, also other drivers that walk the
> > media graph, do make use of this knowledge to obtain sub-devices and access
> > controls in them.
> 
> Well, a kref-based approach would avoid locks most of the time.
> Perhaps it could be combined with RCS.
> 
> > 
> > > 
> > > The removal code needs to use whatever lock (or lockless) schema we
> > > use to serialize the access to the graph.
> > >   
> > > > Just look at what the
> > > > drivers do with entities: they use the v4l2_subdev interface and the control
> > > > framework to access them.
> > > > 
> > > > These data structures contain struct media_entity in them, and that entity
> > > > is part of the media graph. Other drivers use entities e.g. to obtain
> > > > control values from them. References should be used to prevent releasing the
> > > > memory.  
> > > 
> > > References are used by the driver's core, using kobject_get() and
> > > kobject_put(). That warrants that dev.release() will only be called
> > > when nobody is using it anymore.  
> > 
> > Yes, but this does not reach entities. Their lifetime is not related to
> > that.
> 
> No. That's why, currently, we need to lock before adding/removing
> graph objects.
> 
> > >   
> > > > media_entity_get() and media_entity_put() do not do what you'd expect.  
> > > 
> > > Please elaborate.  
> > 
> > The functions simply get / put the module that owns the media entity. The
> > entities as such are not refcounted, and acquiring the driver's module does
> > not guarantee the entities aren't released.
> 
> Yes.
> 
> > >   
> > > > v4l2_subdev_call() should also verify that a sub-device is registered, and
> > > > make sure it will stay that way for the duration of call: the driver must be
> > > > able to expect the entity is accessible as the driver registered it.  
> > > 
> > > Yes, but I can't see how this is related to this discussion. Before
> > > unregistering struct device, you need to unbind the subdevs.
> > > 
> > > The only case I can see of calling v4l2_subdev_call() after all file
> > > descriptors are closed is if you have some kthread running. You need to 
> > > call kthread_stop() for such kthreads before freeing struct device.
> > > 
> > > You could do it at a my_devnode_release() if you need the kthread running
> > > even after closing all file descriptors, or even before that, before
> > > calling media_device_unregister().
> > >   
> > > > The same goes for the control framework.  
> > > 
> > > I don't think we have kthreads for controls. The control routines
> > > are called only when a file descriptor is opened. So, I don't see
> > > any possible issue with the control framework.  
> > 
> > This isn't about kthreads; other drivers do this as well through
> > user-initiated actions. Such as starting or stopping streaming.
> 
> As explaining before, either streamoff should happen during MC removal
> or you need to increment kobject refcount to serialize the removal
> order.
> 
> > >   
> > > > As far as I remember, we somehow assumed that just acquiring the related
> > > > kernel modules would be enough to counter this but it is not.  
> > > 
> > > Well, if not, you could use kobject_get() and kobject_put() to increment
> > > or decrement the cdev's refcount. Yet, I suspect that, if the drivers are 
> > > properly designed, you won't need to manually touch at the kref.  
> > 
> > Entities are not refcounted. You can't get a kobject as there's none to get.
> 
> Entity removal is protected via mutex.

It is, but if you get a reference to an entity, the reference is only
guaranteed to be valid as long as you hold the mutex.

> 
> > >   
> > > > 
> > > > I would prefer to postpone this however, the patchset already does enough
> > > > for a single patchset. Fixing this properly would likely require wait/wound
> > > > mutexes for individual entities.
> > > >   
> > > > > callback to release memory.
> > > > > 
> > > > > In other words, dev.release() is only called after the driver's base
> > > > > knows that the cdev is not in use anymore. So, no ioctl() or any
> > > > > other syscalls on that point.
> > > > > 
> > > > > Ok, nothing prevents some driver to do the wrong thing, keeping a
> > > > > copy of struct device and using it after free, for example storing
> > > > > it on a devm alocated memory, and printing some debug message
> > > > > after struct device is freed, but this is a driver's bug.
> > > > > 
> > > > > What really worries me on this series is that it seemed that you 
> > > > > didn't understood how the current approach works. So, you decided
> > > > > to just revert it and start from scratch. This is dangerous, as
> > > > > it could cause problems to other scenarios than yours.    
> > > > 
> > > > I'm not quite sure what do you mean.
> > > > 
> > > > It may well be that the patchset will require changes but that's precisely
> > > > the reason why patches are reviewed before merging.  
> > > 
> > > From your comments and from your code, you didn't seem to realize that
> > > the current approach relies at the struct device refcount. See above.  
> > 
> > That refcount is only for struct media_devnode. It's simply not enough, as
> > I've elaborated:
> > 
> > - No serialisation between IOCTL and releasing media device memory.
> > 
> > 	- This causes that once the IOCTL call has begun, media device may
> > 	  be released, and this released memory can be accessed by the IOCTL
> > 	  handler.
> > 
> > - Drivers and frameworks that access the media device through other device
> >   nodes such as V4L2 devices will also access released memory.
> > 
> > There could be others.
> 
> See above.
> 
> 
> > 
> > >   
> > > > >     
> > > > > > - Care is taken that the unregistration process and releasing memory happens
> > > > > >   in the right order. This was not always the case previously.    
> > > > > 
> > > > > Freeing memory for struct media_devnode, struct device and struct cdev 
> > > > > is currently handled by the driver's core, when it known to be safe,
> > > > > and using the same logic that other subsystems do.    
> > > > 
> > > > That's simply not the case. Other sub-systems do not have graphs managed by
> > > > multiple device drivers for multiple physical devices that expose device
> > > > nodes through which all of those devices can be accessed. The problem domain
> > > > is far more complex than if you had a single physical device for which a
> > > > driver would expose a device node or two to the user space.  
> > > 
> > > No. The current approach uses the struct device associated with /dev/media0,
> > > created via cdev, to provide a refcount for the data associated with the
> > > character device.
> > > 
> > > The struct device kobject refcount ensures that everything associated
> > > with it will only be freed after the refcount goes to zero.
> > > 
> > > As I said before, if are there any cases where the refcount is going
> > > early to zero, it is just a matter of adding a few kobject_get() and
> > > kobject_put() to ensure that this won't happen early, if the driver is
> > > so broken that it is unable to do the right refcount.  
> > 
> > That's correct, but it only applies to struct media_devnode. Nothing else.
> > Please see above.
> 
> Well, don't remove entities before stop using them.
> 
> > > > > We might do it different, but we need a strong reason to do it, as
> > > > > going away from the usual practice is dangerous.    
> > > > 
> > > > I think we already did that when we merged the original Media controller and
> > > > V4L2 sub-device patches...
> > > >   
> > > > >     
> > > > > > - The driver remains responsible for the memory of the video and sub-device
> > > > > >   nodes. However, now the Media controller provides a convenient callback to
> > > > > >   the driver to release any memory resources when the time has come to do
> > > > > >   so. This takes place just before the media device memory is released.    
> > > > > 
> > > > > Drivers could use devnode->dev.release for that. Of course, if they
> > > > > override it, they should be calling media_devnode_release() on their
> > > > > internal release functions.    
> > > > 
> > > > That'd be really hackish. The drivers currently don't deal with
> > > > media_devnode directly now and I don't think they should be obliged to.  
> > > 
> > > I'm not against adding a callback instead. However, that makes it lose
> > > flexibility, as the callback will either be called before of after
> > > freeing struct device.
> > > 
> > > By overriding the dev.release callback, we have a finer control.
> > > 
> > > If you don't see any case where we'll be freeing data after freeing
> > > struct device, then a callback would work.
> > >   
> > > > >     
> > > > > > - Drivers that do not strictly need to be removable require no changes. The
> > > > > >   benefits of this set become tangible for any driver by changing how the
> > > > > >   driver allocates memory for the data structures. Ideally at least
> > > > > >   drivers for hot-removable devices should be converted.    
> > > > > 
> > > > > Drivers should allow device removal and/or driver removal. If you're
> > > > > doing any change here, you need to touch *all* drivers to use the new 
> > > > > way.    
> > > > 
> > > > Let's first agree on what needs to be fixed and how, and then think about
> > > > converting the drivers. Buggy code has a tendency to continue to be buggy
> > > > unless it is fixed (or replaced).  
> > > 
> > > True, but as I said, this series create buggy code when it ignored what
> > > was fixed already. Also, a patch series to be considered ready for
> > > upstream need to do the needed changes on all drivers it affects.
> > >   
> > > > > > In order to make the current drivers to behave well it is necessary to make
> > > > > > changes to how memory is allocated in the drivers. If you look at the sample
> > > > > > patches that are part of the set for the omap3isp driver, you'll find that
> > > > > > around 95% of the changes are related to removing the user of devm_() family
> > > > > > of functions instead of Media controller API changes. In this regard, the
> > > > > > approach taken here requires very little if any additional overhead.    
> > > > > 
> > > > > Well, send the patches that do the 95% of the changes first e. g. devm_()
> > > > > removal, and check if you aren't using any dev_foo() printk after
> > > > > unregister, and send such patch series, without RFC. Then test what's
> > > > > still broken, if any and let's discuss with your results, in a way
> > > > > that we can all reproduce the issues you may be facing on other drivers
> > > > > that don't use devm*().    
> > > > 
> > > > As I said, there's currently no way to properly release these resources as
> > > > the driver won't receive a callback from media device release.  
> > > 
> > > If you're so convinced that it is needed and you won't be overriding
> > > media device's struct device release callback, just add it. It should
> > > be a 3 lines patch.  
> > 
> > Just the callback isn't enough. You need to get a reference to the kobject
> > when the graph components may be accessed.
> 
> Yes. Or protect it with a mutex.
> 
> > Should we add reference counts to entities, we could add functions to get
> > references to entities, and make the media device their parent. That'd be a
> > largish change but it might not affect that many drivers after all.
> 
> Adding krefs to graph objects can be handled inside the core, except for
> the drivers that implement their own graph traversal functions.

I think at least the omap3isp does. That's one reason to convert to what the
framework provides. I can check at some point if that'd be feasible.

-- 
Regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management
  2016-12-15 11:23   ` Laurent Pinchart
@ 2016-12-15 11:39     ` Sakari Ailus
  2016-12-15 11:42       ` Laurent Pinchart
  2016-12-16 13:32     ` Sakari Ailus
  1 sibling, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2016-12-15 11:39 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Sakari Ailus, linux-media, hverkuil, mchehab, shuahkh

Hi Laurent,

Thanks for the review!

On Thu, Dec 15, 2016 at 01:23:50PM +0200, Laurent Pinchart wrote:
> Hi Sakari,
> 
> Thank you for the patch.
> 
> On Saturday 27 Aug 2016 02:43:29 Sakari Ailus wrote:
> > devm functions are fine for managing resources that are directly related
> > to the device at hand and that have no other dependencies. However, a
> > process holding a file handle to a device created by a driver for a device
> > may result in the file handle left behind after the device is long gone.
> > This will result in accessing released (and potentially reallocated)
> > memory.
> > 
> > Instead, rely on the media device which will stick around until all users
> > are gone.
> 
> Could you move this patch to the beginning of the series to show that 
> converting the driver away from devm_* isn't enough to fix the problem that 
> the series tries to address ?

Unfortunately not. The patch depends on the previous patch; the
isp_release() function is called once the last user of the device nodes (MC,
V4L2 and V4L2 sub-dev) is gone.

I'll also see what could be done based on Mauro's suggestion to move
streamoff to device removal. That could fix a number of problems (but not
all of them).

> 
> > Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> > ---
> >  drivers/media/platform/omap3isp/isp.c         | 38 +++++++++++++++++-------
> >  drivers/media/platform/omap3isp/ispccp2.c     |  3 ++-
> >  drivers/media/platform/omap3isp/isph3a_aewb.c | 20 +++++++++-----
> >  drivers/media/platform/omap3isp/isph3a_af.c   | 20 +++++++++-----
> >  drivers/media/platform/omap3isp/isphist.c     |  5 ++--
> >  drivers/media/platform/omap3isp/ispstat.c     |  2 ++
> >  6 files changed, 63 insertions(+), 25 deletions(-)
> > 
> > diff --git a/drivers/media/platform/omap3isp/isp.c
> > b/drivers/media/platform/omap3isp/isp.c index 689efe8..262ddf7 100644
> > --- a/drivers/media/platform/omap3isp/isp.c
> > +++ b/drivers/media/platform/omap3isp/isp.c
> > @@ -1370,7 +1370,7 @@ static int isp_get_clocks(struct isp_device *isp)
> >  	unsigned int i;
> > 
> >  	for (i = 0; i < ARRAY_SIZE(isp_clocks); ++i) {
> > -		clk = devm_clk_get(isp->dev, isp_clocks[i]);
> > +		clk = clk_get(isp->dev, isp_clocks[i]);
> >  		if (IS_ERR(clk)) {
> >  			dev_err(isp->dev, "clk_get %s failed\n", 
> isp_clocks[i]);
> >  			return PTR_ERR(clk);
> > @@ -1382,6 +1382,14 @@ static int isp_get_clocks(struct isp_device *isp)
> >  	return 0;
> >  }
> > 
> > +static void isp_put_clocks(struct isp_device *isp)
> > +{
> > +	unsigned int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(isp_clocks); ++i)
> > +		clk_put(isp->clock[i]);
> > +}
> > +
> >  /*
> >   * omap3isp_get - Acquire the ISP resource.
> >   *
> > @@ -1596,7 +1604,6 @@ static void isp_unregister_entities(struct isp_device
> > *isp) omap3isp_stat_unregister_entities(&isp->isp_af);
> >  	omap3isp_stat_unregister_entities(&isp->isp_hist);
> > 
> > -	v4l2_device_unregister(&isp->v4l2_dev);
> 
> This isn't correct. The v4l2_device instance should be unregistered here, to 
> make sure that the subdev nodes are unregistered too. And even if moving the 

Good point, I'll fix that for the next revision.

> function call was correct, it should be done in a separate patch as it's 
> unrelated to $SUBJECT.
> 
> >  	media_device_unregister(isp->media_dev);
> >  	media_device_put(isp->media_dev);
> >  }
> > @@ -1951,6 +1958,8 @@ static void isp_release(struct media_device *mdev)
> >  {
> >  	struct isp_device *isp = media_device_priv(mdev);
> > 
> > +	v4l2_device_unregister(&isp->v4l2_dev);
> > +
> >  	isp_cleanup_modules(isp);
> >  	isp_xclk_cleanup(isp);
> > 
> > @@ -1959,6 +1968,10 @@ static void isp_release(struct media_device *mdev)
> >  	__omap3isp_put(isp, false);
> > 
> >  	media_entity_enum_cleanup(&isp->crashed);
> > +
> > +	isp_put_clocks(isp);
> > +
> > +	kfree(isp);
> >  }
> > 
> >  static int isp_attach_iommu(struct isp_device *isp)
> > @@ -2211,7 +2224,7 @@ static int isp_probe(struct platform_device *pdev)
> >  	int ret;
> >  	int i, m;
> > 
> > -	isp = devm_kzalloc(&pdev->dev, sizeof(*isp), GFP_KERNEL);
> > +	isp = kzalloc(sizeof(*isp), GFP_KERNEL);
> >  	if (!isp) {
> >  		dev_err(&pdev->dev, "could not allocate memory\n");
> >  		return -ENOMEM;
> > @@ -2220,21 +2233,23 @@ static int isp_probe(struct platform_device *pdev)
> >  	ret = of_property_read_u32(pdev->dev.of_node, "ti,phy-type",
> >  				   &isp->phy_type);
> >  	if (ret)
> > -		return ret;
> > +		goto error_release_isp;
> 
> I propose reorganizing this a bit more and moving DT parsing after the 
> platform_set_drvdata() call. That way you can pass the ISP device to 
> isp_of_parse_nodes() (which by the way also calls devm_* functions) and group 
> the mutex_destroy() and various kfree() calls under a single error label. You 
> might want to split this reorganization in a separate patch.

Ack.

> 
> >  	isp->syscon = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
> >  						      "syscon");
> > -	if (IS_ERR(isp->syscon))
> > -		return PTR_ERR(isp->syscon);
> > +	if (IS_ERR(isp->syscon)) {
> > +		ret = PTR_ERR(isp->syscon);
> > +		goto error_release_isp;
> > +	}
> > 
> >  	ret = of_property_read_u32_index(pdev->dev.of_node, "syscon", 1,
> >  					 &isp->syscon_offset);
> >  	if (ret)
> > -		return ret;
> > +		goto error_release_isp;
> > 
> >  	ret = isp_of_parse_nodes(&pdev->dev, &isp->notifier);
> >  	if (ret < 0)
> > -		return ret;
> > +		goto error_release_isp;
> > 
> >  	isp->autoidle = autoidle;
> > 
> > @@ -2251,8 +2266,8 @@ static int isp_probe(struct platform_device *pdev)
> >  	platform_set_drvdata(pdev, isp);
> > 
> >  	/* Regulators */
> > -	isp->isp_csiphy1.vdd = devm_regulator_get(&pdev->dev, "vdd-csiphy1");
> > -	isp->isp_csiphy2.vdd = devm_regulator_get(&pdev->dev, "vdd-csiphy2");
> > +	isp->isp_csiphy1.vdd = regulator_get(&pdev->dev, "vdd-csiphy1");
> > +	isp->isp_csiphy2.vdd = regulator_get(&pdev->dev, "vdd-csiphy2");
> 
> How about moving this to omap3isp_csiphy_init() ? You also need to release 
> those regulators.
> 
> However, I wonder whether we couldn't keep devm_* for the clocks and 
> regulators, as they shouldn't be touched anymore after remove() time.

Good point.

> 
> >  	/* Clocks
> >  	 *
> > @@ -2384,6 +2399,9 @@ error_isp:
> >  	__omap3isp_put(isp, false);
> >  error:
> >  	mutex_destroy(&isp->isp_mutex);
> > +	isp_put_clocks(isp);
> > +error_release_isp:
> > +	kfree(isp);
> > 
> >  	return ret;
> >  }
> > diff --git a/drivers/media/platform/omap3isp/ispccp2.c
> > b/drivers/media/platform/omap3isp/ispccp2.c index ca09523..d49ce8a 100644
> > --- a/drivers/media/platform/omap3isp/ispccp2.c
> > +++ b/drivers/media/platform/omap3isp/ispccp2.c
> > @@ -1135,7 +1135,7 @@ int omap3isp_ccp2_init(struct isp_device *isp)
> >  	 * TODO: Don't hardcode the usage of PHY1 (shared with CSI2c).
> >  	 */
> >  	if (isp->revision == ISP_REVISION_2_0) {
> > -		ccp2->vdds_csib = devm_regulator_get(isp->dev, "vdds_csib");
> > +		ccp2->vdds_csib = regulator_get(isp->dev, "vdds_csib");
> >  		if (IS_ERR(ccp2->vdds_csib)) {
> >  			dev_dbg(isp->dev,
> >  				"Could not get regulator vdds_csib\n");
> > @@ -1163,4 +1163,5 @@ void omap3isp_ccp2_cleanup(struct isp_device *isp)
> > 
> >  	omap3isp_video_cleanup(&ccp2->video_in);
> >  	media_entity_cleanup(&ccp2->subdev.entity);
> > +	regulator_put(ccp2->vdds_csib);
> >  }
> > diff --git a/drivers/media/platform/omap3isp/isph3a_aewb.c
> > b/drivers/media/platform/omap3isp/isph3a_aewb.c index ccaf92f..130df8b
> > 100644
> > --- a/drivers/media/platform/omap3isp/isph3a_aewb.c
> > +++ b/drivers/media/platform/omap3isp/isph3a_aewb.c
> 
> Please see my comments on isph3a_af.c below, they apply here too.
> 
> [snip]
> 
> > diff --git a/drivers/media/platform/omap3isp/isph3a_af.c
> > b/drivers/media/platform/omap3isp/isph3a_af.c index 92937f7..7eecf97 100644
> > --- a/drivers/media/platform/omap3isp/isph3a_af.c
> > +++ b/drivers/media/platform/omap3isp/isph3a_af.c
> > @@ -352,9 +352,10 @@ int omap3isp_h3a_af_init(struct isp_device *isp)
> >  {
> >  	struct ispstat *af = &isp->isp_af;
> >  	struct omap3isp_h3a_af_config *af_cfg;
> > -	struct omap3isp_h3a_af_config *af_recover_cfg;
> > +	struct omap3isp_h3a_af_config *af_recover_cfg = NULL;
> > +	int ret;
> > 
> > -	af_cfg = devm_kzalloc(isp->dev, sizeof(*af_cfg), GFP_KERNEL);
> > +	af_cfg = kzalloc(sizeof(*af_cfg), GFP_KERNEL);
> >  	if (af_cfg == NULL)
> >  		return -ENOMEM;
> > 
> > @@ -364,12 +365,12 @@ int omap3isp_h3a_af_init(struct isp_device *isp)
> >  	af->isp = isp;
> > 
> >  	/* Set recover state configuration */
> > -	af_recover_cfg = devm_kzalloc(isp->dev, sizeof(*af_recover_cfg),
> > -				      GFP_KERNEL);
> > +	af_recover_cfg = kzalloc(sizeof(*af_recover_cfg), GFP_KERNEL);
> >  	if (!af_recover_cfg) {
> >  		dev_err(af->isp->dev, "AF: cannot allocate memory for recover 
> "
> >  				      "configuration.\n");
> > -		return -ENOMEM;
> > +		ret = -ENOMEM;
> > +		goto err;
> >  	}
> > 
> >  	af_recover_cfg->paxel.h_start = OMAP3ISP_AF_PAXEL_HZSTART_MIN;
> > @@ -381,13 +382,20 @@ int omap3isp_h3a_af_init(struct isp_device *isp)
> >  	if (h3a_af_validate_params(af, af_recover_cfg)) {
> >  		dev_err(af->isp->dev, "AF: recover configuration is "
> >  				      "invalid.\n");
> 
> Unrelated to this patch, but this shouldn't happen. I wonder whether we could 
> remove the check.
> 
> > -		return -EINVAL;
> > +		ret = -EINVAL;
> > +		goto err;
> >  	}
> > 
> >  	af_recover_cfg->buf_size = h3a_af_get_buf_size(af_recover_cfg);
> >  	af->recover_priv = af_recover_cfg;
> > 
> >  	return omap3isp_stat_init(af, "AF", &h3a_af_subdev_ops);
> 
> You need to catch the omap3isp_stat_init() failures too. Something like
> 
> 	ret = omap3isp_stat_init(af, "AF", &h3a_af_subdev_ops);
> 
> done:
> 	if (ret) {
> 		kfree(af_recover_cfg);
> 		kfree(af_cfg);
> 	}
> 
> 	return ret;
> 
> and replacing the above goto err; with goto done; ?

Ack.

> 
> > +
> > +err:
> > +	kfree(af_cfg);
> > +	kfree(af_recover_cfg);
> > +
> > +	return ret;
> >  }
> > 
> >  void omap3isp_h3a_af_cleanup(struct isp_device *isp)
> > diff --git a/drivers/media/platform/omap3isp/isphist.c
> > b/drivers/media/platform/omap3isp/isphist.c index 7138b04..976cab0 100644
> > --- a/drivers/media/platform/omap3isp/isphist.c
> > +++ b/drivers/media/platform/omap3isp/isphist.c
> > @@ -477,9 +477,9 @@ int omap3isp_hist_init(struct isp_device *isp)
> >  {
> >  	struct ispstat *hist = &isp->isp_hist;
> >  	struct omap3isp_hist_config *hist_cfg;
> > -	int ret = -1;
> > +	int ret;
> > 
> > -	hist_cfg = devm_kzalloc(isp->dev, sizeof(*hist_cfg), GFP_KERNEL);
> > +	hist_cfg = kzalloc(sizeof(*hist_cfg), GFP_KERNEL);
> >  	if (hist_cfg == NULL)
> >  		return -ENOMEM;
> 
> There's a return in the middle of this function that should be turned into a 
> goto done.

Uh-oh. Will fix.

> 
> > @@ -517,6 +517,7 @@ int omap3isp_hist_init(struct isp_device *isp)
> 
> With a done label added right here.
> 
> >  	if (ret) {
> >  		if (hist->dma_ch)
> >  			dma_release_channel(hist->dma_ch);
> > +		kfree(hist_cfg);
> >  	}
> > 
> >  	return ret;
> > diff --git a/drivers/media/platform/omap3isp/ispstat.c
> > b/drivers/media/platform/omap3isp/ispstat.c index 1b9217d..1c1365f 100644
> > --- a/drivers/media/platform/omap3isp/ispstat.c
> > +++ b/drivers/media/platform/omap3isp/ispstat.c
> > @@ -1059,4 +1059,6 @@ void omap3isp_stat_cleanup(struct ispstat *stat)
> >  	mutex_destroy(&stat->ioctl_lock);
> >  	isp_stat_bufs_free(stat);
> >  	kfree(stat->buf);
> > +	kfree(stat->priv);
> > +	kfree(stat->recover_priv);
> >  }
> 

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management
  2016-12-15 11:39     ` Sakari Ailus
@ 2016-12-15 11:42       ` Laurent Pinchart
  2016-12-15 11:45         ` Sakari Ailus
  0 siblings, 1 reply; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-15 11:42 UTC (permalink / raw)
  To: Sakari Ailus; +Cc: Sakari Ailus, linux-media, hverkuil, mchehab, shuahkh

Hi Sakari,

On Thursday 15 Dec 2016 13:39:56 Sakari Ailus wrote:
> On Thu, Dec 15, 2016 at 01:23:50PM +0200, Laurent Pinchart wrote:
> > On Saturday 27 Aug 2016 02:43:29 Sakari Ailus wrote:
> >> devm functions are fine for managing resources that are directly related
> >> to the device at hand and that have no other dependencies. However, a
> >> process holding a file handle to a device created by a driver for a
> >> device may result in the file handle left behind after the device is long
> >> gone. This will result in accessing released (and potentially
> >> reallocated) memory.
> >> 
> >> Instead, rely on the media device which will stick around until all
> >> users are gone.
> > 
> > Could you move this patch to the beginning of the series to show that
> > converting the driver away from devm_* isn't enough to fix the problem
> > that the series tries to address ?
> 
> Unfortunately not. The patch depends on the previous patch; the
> isp_release() function is called once the last user of the device nodes (MC,
> V4L2 and V4L2 sub-dev) is gone.

You can split that part out. The devm_* removal is independent and could be 
moved to the beginning of the series.

> I'll also see what could be done based on Mauro's suggestion to move
> streamoff to device removal. That could fix a number of problems (but not
> all of them).

I'll reply to that separately but it's not the best idea.

> >> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
> >> ---
> >> 
> >>  drivers/media/platform/omap3isp/isp.c         | 38 +++++++++++++-------
> >>  drivers/media/platform/omap3isp/ispccp2.c     |  3 ++-
> >>  drivers/media/platform/omap3isp/isph3a_aewb.c | 20 +++++++++-----
> >>  drivers/media/platform/omap3isp/isph3a_af.c   | 20 +++++++++-----
> >>  drivers/media/platform/omap3isp/isphist.c     |  5 ++--
> >>  drivers/media/platform/omap3isp/ispstat.c     |  2 ++
> >>  6 files changed, 63 insertions(+), 25 deletions(-)

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management
  2016-12-15 11:42       ` Laurent Pinchart
@ 2016-12-15 11:45         ` Sakari Ailus
  2016-12-15 11:57           ` Laurent Pinchart
  0 siblings, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2016-12-15 11:45 UTC (permalink / raw)
  To: Laurent Pinchart, Sakari Ailus; +Cc: linux-media, hverkuil, mchehab, shuahkh

Hi Laurent,

On 12/15/16 13:42, Laurent Pinchart wrote:
> You can split that part out. The devm_* removal is independent and could be 
> moved to the beginning of the series.

Where do you release the memory in that case? In driver's remove(), i.e.
this patch would simply move that code to isp_remove()?

-- 
Sakari Ailus
sakari.ailus@linux.intel.com

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management
  2016-12-15 11:45         ` Sakari Ailus
@ 2016-12-15 11:57           ` Laurent Pinchart
  2016-12-15 19:17             ` Shuah Khan
  0 siblings, 1 reply; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-15 11:57 UTC (permalink / raw)
  To: Sakari Ailus; +Cc: Sakari Ailus, linux-media, hverkuil, mchehab, shuahkh

On Thursday 15 Dec 2016 13:45:25 Sakari Ailus wrote:
> Hi Laurent,
> 
> On 12/15/16 13:42, Laurent Pinchart wrote:
> > You can split that part out. The devm_* removal is independent and could
> > be moved to the beginning of the series.
> 
> Where do you release the memory in that case? In driver's remove(), i.e.
> this patch would simply move that code to isp_remove()?

Yes, the kfree() calls would be in isp_remove(). The patch will then be 
faithful to its $SUBJECT, and moving to a release() handler should be done in 
a separate patch.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 11:30                       ` Sakari Ailus
@ 2016-12-15 12:56                         ` Laurent Pinchart
  2016-12-15 14:03                           ` Hans Verkuil
  0 siblings, 1 reply; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-15 12:56 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Mauro Carvalho Chehab, Shuah Khan, Sakari Ailus, linux-media, hverkuil

Hi Sakari,

On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
> > Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> >> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> >>> Hi Sakari,
> >>>
> >>> I answered you point to point below, but I suspect that you missed how
> >>> the current approach works. So, I decided to write a quick summary
> >>> here.
> >>>
> >>> The character devices /dev/media? are created via cdev, with relies on
> >>> a kobject per device, with has an embedded struct kref inside.
> >>>
> >>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> >>> struct device, when the code does:
> >>>   devnode->cdev.kobj.parent = &devnode->dev.kobj;
> >>>
> >>> before calling cdev_add().
> >>>
> >>> The current lifetime management is actually based on cdev's kobject's
> >>> refcount, provided by its embedded kref.
> >>>
> >>> The kref warrants that any data associated with /dev/media0 won't be 
> >>> freed if there are any pending system call. In other words, when 
> >>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
> >>> and will call kobject_put(). 
> >>>
> >>> If the refcount is zero, it will call devnode->dev.release(). If the 
> >>> kobject refcount is not zero, the data won't be freed.
> >>>
> >>> So, in the best case scenario, there's no opened file descriptors
> >>> by the time media device node is unregistered. So, it will free
> >>> everything.
> >>>
> >>> In the worse case scenario, e. g. when the driver is removed or 
> >>> unbind while /dev/media0 has some opened file descriptor(s),
> >>> the cdev logic will do the proper lifetime management.
> >>>
> >>> On such case, /dev/media0 disappears from the file system, so another
> >>> open is not possible anymore. The data structures will remain
> >>> allocated until all associated file descriptors are not closed.
> >>>
> >>> When all file descriptors are closed, the data will be freed.
> >>>
> >>> On that time, it will call an optional dev.release() callback,
> >>> responsible to free any other data struct that the driver allocated.  
> >>
> >> The patchset does not change this. It's not a question of the
> >> media_devnode struct either. That's not an issue.
> >>
> >> The issue is rather what else can be accessed through the media device
> >> and other interfaces. As IOCTLs are not serialised with device removal
> >> (which now releases much of the data structures) 
> >
> > Huh? ioctls are serialized with struct device removal. The Driver core
> > warrants that.
> 
> How?
> 
> As far as I can tell, there's nothing in the way of an IOCTL being in
> progress on a character device which is registered by the driver for a
> hardware device which is being removed.
> 
> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
> are taken during that path, which I believe is by design.
> 
> >> there's a high chance of accessing
> >> released memory (or mutexes that have been already destroyed). An
> >> example of that is here, stopping a running pipeline after unbinding
> >> the device. What happens there is that the media device is released
> >> whilst it's in use through the video device.
> >>
> >> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
> >
> > It is not clear from the logs what the driver tried to do, but
> > that sounds like a driver's bug, with was not prepared to properly
> > handle unbinds.
> >
> > The problem here is that isp_video_release() is called by V4L2
> > release logic, and not by the MC one:
> >
> > static const struct v4l2_file_operations isp_video_fops = {
> >       .owner          = THIS_MODULE,
> >       .open           = isp_video_open,
> >       .release        = isp_video_release,
> >       .poll           = vb2_fop_poll,
> >       .unlocked_ioctl = video_ioctl2,
> >       .mmap           = vb2_fop_mmap,
> > };
> >
> > It seems that the driver's logic allows it to be called before or
> > after destroying the MC.
> >
> > Assuming that, if the OMAP3 driver is not used it works,
> > it means that, if the isp_video_release() is called
> > first, no errors will happen, but if MC is destroyed before
> > V4L2 call to its .release() callback, as there's no logic at the
> > driver that would detect it, isp_video_release() will be calling
> > isp_video_streamoff(), with depends on the MC to work.
> >
> > On a first glance, I can see two ways of fixing it:
> >
> > 1) to increment devnode's device kobject refcount at OMAP3 .probe(), 
> > decrementing it only at isp_video_release(). That will ensure that
> > MC will only be removed after V4L2 removal.

As soon as you have to dig deep in a structure to find a reference counter and 
increment it, bypassing all the API layers, you can be entirely sure that the 
solution is wrong.

> > 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
> > inside the MC .release() callback. 
> 
> This is a fair suggestion, indeed. Let me see what could be done there.
> Albeit this is just *one* of the existing issues. It will not address all
> problems fixed by the patchset.

We need to stop the hardware at .remove() time. That should not be linked to a 
videodev, v4l2_device or media_device .release() callback. When the .remove() 
callback returns the driver is not allowed to touch the hardware anymore. In 
particular, power domains might clocks or power supplies, leading to invalid 
access faults if we try to access hardware registers.

USB devices get help from the USB core that cancels all USB operations in 
progress when they're disconnected. Platform devices don't have it as easy, 
and need to implement everything themselves. We thus need to stop the 
hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF ioctl at 
.remove() time. That could introduce other races between .remove() and the 
userspace API. A better solution is to make sure the objects that are needed 
at .release() time of the device node are all reference-counted and only 
released when the last reference goes away.

There's plenty of way to try and work around the problem in drivers, some more 
racy than others, but if we require changes to all platform drivers to fix 
this we need to ensure that we get it right, not as half-baked hacks spread 
around the whole subsystem.

> > That could be done by overwriting the dev.release() callback at
> > omap3 driver, as I discussed on my past e-mails, and flagging the
> > driver that it should not accept streamon anymore, as the hardware
> > is being disconnecting.
> 
> A mutex will be needed to serialise the this with starting streaming.
> 
> > Btw, that explains a lot why Shuah can't reproduce the stuff you're
> > complaining on her USB hardware.
> >
> > The USB subsystem has a a .disconnect() callback that notifies
> > the drivers that a device was unbound (likely physically removed).
> > The way USB media drivers handle it is by returning -ENODEV to any
> > V4L2 call that would try to touch at the hardware after unbound.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 12:56                         ` Laurent Pinchart
@ 2016-12-15 14:03                           ` Hans Verkuil
  2016-12-15 14:32                             ` Mauro Carvalho Chehab
                                               ` (2 more replies)
  0 siblings, 3 replies; 89+ messages in thread
From: Hans Verkuil @ 2016-12-15 14:03 UTC (permalink / raw)
  To: Laurent Pinchart, Sakari Ailus
  Cc: Mauro Carvalho Chehab, Shuah Khan, Sakari Ailus, linux-media

On 15/12/16 13:56, Laurent Pinchart wrote:
> Hi Sakari,
>
> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>>>> Hi Sakari,
>>>>>
>>>>> I answered you point to point below, but I suspect that you missed how
>>>>> the current approach works. So, I decided to write a quick summary
>>>>> here.
>>>>>
>>>>> The character devices /dev/media? are created via cdev, with relies on
>>>>> a kobject per device, with has an embedded struct kref inside.
>>>>>
>>>>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>>>>> struct device, when the code does:
>>>>>   devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>>>>
>>>>> before calling cdev_add().
>>>>>
>>>>> The current lifetime management is actually based on cdev's kobject's
>>>>> refcount, provided by its embedded kref.
>>>>>
>>>>> The kref warrants that any data associated with /dev/media0 won't be
>>>>> freed if there are any pending system call. In other words, when
>>>>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
>>>>> and will call kobject_put().
>>>>>
>>>>> If the refcount is zero, it will call devnode->dev.release(). If the
>>>>> kobject refcount is not zero, the data won't be freed.
>>>>>
>>>>> So, in the best case scenario, there's no opened file descriptors
>>>>> by the time media device node is unregistered. So, it will free
>>>>> everything.
>>>>>
>>>>> In the worse case scenario, e. g. when the driver is removed or
>>>>> unbind while /dev/media0 has some opened file descriptor(s),
>>>>> the cdev logic will do the proper lifetime management.
>>>>>
>>>>> On such case, /dev/media0 disappears from the file system, so another
>>>>> open is not possible anymore. The data structures will remain
>>>>> allocated until all associated file descriptors are not closed.
>>>>>
>>>>> When all file descriptors are closed, the data will be freed.
>>>>>
>>>>> On that time, it will call an optional dev.release() callback,
>>>>> responsible to free any other data struct that the driver allocated.
>>>>
>>>> The patchset does not change this. It's not a question of the
>>>> media_devnode struct either. That's not an issue.
>>>>
>>>> The issue is rather what else can be accessed through the media device
>>>> and other interfaces. As IOCTLs are not serialised with device removal
>>>> (which now releases much of the data structures)
>>>
>>> Huh? ioctls are serialized with struct device removal. The Driver core
>>> warrants that.
>>
>> How?
>>
>> As far as I can tell, there's nothing in the way of an IOCTL being in
>> progress on a character device which is registered by the driver for a
>> hardware device which is being removed.
>>
>> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
>> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
>> are taken during that path, which I believe is by design.

chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
on release(). Thus ensuring that the cdev can never be removed while in an
ioctl.

>>
>>>> there's a high chance of accessing
>>>> released memory (or mutexes that have been already destroyed). An
>>>> example of that is here, stopping a running pipeline after unbinding
>>>> the device. What happens there is that the media device is released
>>>> whilst it's in use through the video device.
>>>>
>>>> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
>>>
>>> It is not clear from the logs what the driver tried to do, but
>>> that sounds like a driver's bug, with was not prepared to properly
>>> handle unbinds.
>>>
>>> The problem here is that isp_video_release() is called by V4L2
>>> release logic, and not by the MC one:
>>>
>>> static const struct v4l2_file_operations isp_video_fops = {
>>>       .owner          = THIS_MODULE,
>>>       .open           = isp_video_open,
>>>       .release        = isp_video_release,
>>>       .poll           = vb2_fop_poll,
>>>       .unlocked_ioctl = video_ioctl2,
>>>       .mmap           = vb2_fop_mmap,
>>> };
>>>
>>> It seems that the driver's logic allows it to be called before or
>>> after destroying the MC.
>>>
>>> Assuming that, if the OMAP3 driver is not used it works,
>>> it means that, if the isp_video_release() is called
>>> first, no errors will happen, but if MC is destroyed before
>>> V4L2 call to its .release() callback, as there's no logic at the
>>> driver that would detect it, isp_video_release() will be calling
>>> isp_video_streamoff(), with depends on the MC to work.
>>>
>>> On a first glance, I can see two ways of fixing it:
>>>
>>> 1) to increment devnode's device kobject refcount at OMAP3 .probe(),
>>> decrementing it only at isp_video_release(). That will ensure that
>>> MC will only be removed after V4L2 removal.
>
> As soon as you have to dig deep in a structure to find a reference counter and
> increment it, bypassing all the API layers, you can be entirely sure that the
> solution is wrong.

Indeed.

>
>>> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
>>> inside the MC .release() callback.
>>
>> This is a fair suggestion, indeed. Let me see what could be done there.
>> Albeit this is just *one* of the existing issues. It will not address all
>> problems fixed by the patchset.
>
> We need to stop the hardware at .remove() time. That should not be linked to a
> videodev, v4l2_device or media_device .release() callback. When the .remove()
> callback returns the driver is not allowed to touch the hardware anymore. In
> particular, power domains might clocks or power supplies, leading to invalid
> access faults if we try to access hardware registers.

Correct.

>
> USB devices get help from the USB core that cancels all USB operations in
> progress when they're disconnected. Platform devices don't have it as easy,
> and need to implement everything themselves. We thus need to stop the
> hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF ioctl at
> .remove() time.

Please don't. This shouldn't be done automatically.

 > That could introduce other races between .remove() and the
> userspace API. A better solution is to make sure the objects that are needed
> at .release() time of the device node are all reference-counted and only
> released when the last reference goes away.
>
> There's plenty of way to try and work around the problem in drivers, some more
> racy than others, but if we require changes to all platform drivers to fix
> this we need to ensure that we get it right, not as half-baked hacks spread
> around the whole subsystem.

Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
device and I see no reason whatsoever to start modifying platform drivers just
because you can do an unbind. I know there are real hot-pluggable devices, and
getting this right for those is of course important.

If the omap3 is used as a testbed, then that's fine by me, but even then I
probably wouldn't want the omap3 code that makes this possible in the kernel.
It's just additional code for no purpose.

>>> That could be done by overwriting the dev.release() callback at
>>> omap3 driver, as I discussed on my past e-mails, and flagging the
>>> driver that it should not accept streamon anymore, as the hardware
>>> is being disconnecting.
>>
>> A mutex will be needed to serialise the this with starting streaming.
>>
>>> Btw, that explains a lot why Shuah can't reproduce the stuff you're
>>> complaining on her USB hardware.
>>>
>>> The USB subsystem has a a .disconnect() callback that notifies
>>> the drivers that a device was unbound (likely physically removed).
>>> The way USB media drivers handle it is by returning -ENODEV to any
>>> V4L2 call that would try to touch at the hardware after unbound.
>

In my view the main problem is that the media core is bound to a struct
device set by the driver that creates the MC. But since the MC gives an
overview of lots of other (sub)devices the refcount of the media device
should be increased for any (sub)device that adds itself to the MC and
decreased for any (sub)device that is removed. Only when the very last
user goes away can the MC memory be released.

The memory/refcounting associated with device nodes is unrelated to this:
once a devnode is unregistered it will be removed in /dev, and once the
last open fh closes any memory associated with the devnode can be released.
That will also decrease the refcount to its parent device.

This also means that it is a bad idea to embed devnodes in a larger struct.
They should be allocated and freed when the devnode is unregistered and
the last open filehandle is closed.

Then the parent's device refcount is decreased, and that may now call its
release callback if the refcount reaches 0.

For the media controller's device: any other device driver that needs access
to it needs to increase that device's refcount, and only when those devices
are released will they decrease the MC device's refcount.

And when that refcount goes to 0 can we finally free everything.

With regards to the opposition to reverting those initial patches, I'm
siding with Greg KH. Just revert the bloody patches. It worked most of the
time before those patches, so reverting really won't cause bisect problems.

Just revert and build up things as they should.

Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
parent pointer for example) and many drivers (including omap3isp) embed
video_device, which is wrong and can lead to complications.

I'm to blame for the embedding since I thought that was a good idea at one
time. I now realized that it isn't. Sorry about that...

And because the cdev of the video_device doesn't know about the parent
device it is (I think) possible that the parent device is released before
the cdev is released. Which can result in major problems.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 14:03                           ` Hans Verkuil
@ 2016-12-15 14:32                             ` Mauro Carvalho Chehab
  2016-12-15 14:45                               ` Hans Verkuil
  2016-12-16 16:43                               ` Laurent Pinchart
  2016-12-15 14:45                             ` Shuah Khan
  2016-12-16 15:07                             ` Sakari Ailus
  2 siblings, 2 replies; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-15 14:32 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Laurent Pinchart, Sakari Ailus, Shuah Khan, Sakari Ailus, linux-media

Em Thu, 15 Dec 2016 15:03:36 +0100
Hans Verkuil <hverkuil@xs4all.nl> escreveu:

> On 15/12/16 13:56, Laurent Pinchart wrote:
> > Hi Sakari,
> >
> > On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:  
> >> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:  
> >>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:  
> >>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:  
> >>>>> Hi Sakari,
> >>>>>


> > There's plenty of way to try and work around the problem in drivers, some more
> > racy than others, but if we require changes to all platform drivers to fix
> > this we need to ensure that we get it right, not as half-baked hacks spread
> > around the whole subsystem.  
> 
> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
> device and I see no reason whatsoever to start modifying platform drivers just
> because you can do an unbind. I know there are real hot-pluggable devices, and
> getting this right for those is of course important.

That's indeed a very good point. If unbind is not needed by any usecase,
the better fix for OMAP3 would be to just prevent it to happen in the first
place.

> >>> The USB subsystem has a a .disconnect() callback that notifies
> >>> the drivers that a device was unbound (likely physically removed).
> >>> The way USB media drivers handle it is by returning -ENODEV to any
> >>> V4L2 call that would try to touch at the hardware after unbound.  
> >  
> 
> In my view the main problem is that the media core is bound to a struct
> device set by the driver that creates the MC. But since the MC gives an
> overview of lots of other (sub)devices the refcount of the media device
> should be increased for any (sub)device that adds itself to the MC and
> decreased for any (sub)device that is removed. Only when the very last
> user goes away can the MC memory be released.
> 
> The memory/refcounting associated with device nodes is unrelated to this:
> once a devnode is unregistered it will be removed in /dev, and once the
> last open fh closes any memory associated with the devnode can be released.
> That will also decrease the refcount to its parent device.
> 
> This also means that it is a bad idea to embed devnodes in a larger struct.
> They should be allocated and freed when the devnode is unregistered and
> the last open filehandle is closed.
> 
> Then the parent's device refcount is decreased, and that may now call its
> release callback if the refcount reaches 0.
> 
> For the media controller's device: any other device driver that needs access
> to it needs to increase that device's refcount, and only when those devices
> are released will they decrease the MC device's refcount.
> 
> And when that refcount goes to 0 can we finally free everything.
> 
> With regards to the opposition to reverting those initial patches, I'm
> siding with Greg KH. Just revert the bloody patches. It worked most of the
> time before those patches, so reverting really won't cause bisect problems.

You're contradicting yourself here ;)

The patches that this patch series is reverting are the ones that
de-embeeds devnode struct and fixes its lifecycle.

Reverting those patches will cause regressions on hot-pluggable drivers,
preventing them to be unplugged. So, if we're willing to revert, then we
should also revert MC support on them.

> Just revert and build up things as they should.
> 
> Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
> parent pointer for example) and many drivers (including omap3isp) embed
> video_device, which is wrong and can lead to complications.
> 
> I'm to blame for the embedding since I thought that was a good idea at one
> time. I now realized that it isn't. Sorry about that...
> 
> And because the cdev of the video_device doesn't know about the parent
> device it is (I think) possible that the parent device is released before
> the cdev is released. Which can result in major problems.

I agree with you here. IMHO, de-embeeding cdev's struct from video_device
seems to be the right thing to do at V4L2 side too.

Regards,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 14:32                             ` Mauro Carvalho Chehab
@ 2016-12-15 14:45                               ` Hans Verkuil
  2016-12-15 15:45                                 ` Mauro Carvalho Chehab
  2016-12-16 16:43                               ` Laurent Pinchart
  1 sibling, 1 reply; 89+ messages in thread
From: Hans Verkuil @ 2016-12-15 14:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Laurent Pinchart, Sakari Ailus, Shuah Khan, Sakari Ailus, linux-media

On 15/12/16 15:32, Mauro Carvalho Chehab wrote:
> Em Thu, 15 Dec 2016 15:03:36 +0100
> Hans Verkuil <hverkuil@xs4all.nl> escreveu:
>
>> On 15/12/16 13:56, Laurent Pinchart wrote:
>>> Hi Sakari,
>>>
>>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>> Hi Sakari,
>>>>>>>
>
>
>>> There's plenty of way to try and work around the problem in drivers, some more
>>> racy than others, but if we require changes to all platform drivers to fix
>>> this we need to ensure that we get it right, not as half-baked hacks spread
>>> around the whole subsystem.
>>
>> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
>> device and I see no reason whatsoever to start modifying platform drivers just
>> because you can do an unbind. I know there are real hot-pluggable devices, and
>> getting this right for those is of course important.
>
> That's indeed a very good point. If unbind is not needed by any usecase,
> the better fix for OMAP3 would be to just prevent it to happen in the first
> place.
>
>>>>> The USB subsystem has a a .disconnect() callback that notifies
>>>>> the drivers that a device was unbound (likely physically removed).
>>>>> The way USB media drivers handle it is by returning -ENODEV to any
>>>>> V4L2 call that would try to touch at the hardware after unbound.
>>>
>>
>> In my view the main problem is that the media core is bound to a struct
>> device set by the driver that creates the MC. But since the MC gives an
>> overview of lots of other (sub)devices the refcount of the media device
>> should be increased for any (sub)device that adds itself to the MC and
>> decreased for any (sub)device that is removed. Only when the very last
>> user goes away can the MC memory be released.
>>
>> The memory/refcounting associated with device nodes is unrelated to this:
>> once a devnode is unregistered it will be removed in /dev, and once the
>> last open fh closes any memory associated with the devnode can be released.
>> That will also decrease the refcount to its parent device.
>>
>> This also means that it is a bad idea to embed devnodes in a larger struct.
>> They should be allocated and freed when the devnode is unregistered and
>> the last open filehandle is closed.
>>
>> Then the parent's device refcount is decreased, and that may now call its
>> release callback if the refcount reaches 0.
>>
>> For the media controller's device: any other device driver that needs access
>> to it needs to increase that device's refcount, and only when those devices
>> are released will they decrease the MC device's refcount.
>>
>> And when that refcount goes to 0 can we finally free everything.
>>
>> With regards to the opposition to reverting those initial patches, I'm
>> siding with Greg KH. Just revert the bloody patches. It worked most of the
>> time before those patches, so reverting really won't cause bisect problems.
>
> You're contradicting yourself here ;)
>
> The patches that this patch series is reverting are the ones that
> de-embeeds devnode struct and fixes its lifecycle.
>
> Reverting those patches will cause regressions on hot-pluggable drivers,
> preventing them to be unplugged. So, if we're willing to revert, then we
> should also revert MC support on them.

Two options:

1) Revert, then build up a proper solution.
2) Do a big-bang patch switching directly over to the new solution, but that's
very hard to review.
2a) Post the patch series in small chunks on the mailinglist (starting with the
reverts), but once we're all happy merge that patch series into a single big-bang
patch and apply that.

As far as I am concerned the whole hotplugging code is broken and has been for
a very long time. We (or at least I :-) ) understand the underlying concepts
a lot better, so we can do a better job. But the transition may well be
painful.

Regards,

	Hans

>
>> Just revert and build up things as they should.
>>
>> Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
>> parent pointer for example) and many drivers (including omap3isp) embed
>> video_device, which is wrong and can lead to complications.
>>
>> I'm to blame for the embedding since I thought that was a good idea at one
>> time. I now realized that it isn't. Sorry about that...
>>
>> And because the cdev of the video_device doesn't know about the parent
>> device it is (I think) possible that the parent device is released before
>> the cdev is released. Which can result in major problems.
>
> I agree with you here. IMHO, de-embeeding cdev's parent struct from
> video_device seems to be the right thing to do at V4L2 side too.
>
> Regards,
> Mauro
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 14:03                           ` Hans Verkuil
  2016-12-15 14:32                             ` Mauro Carvalho Chehab
@ 2016-12-15 14:45                             ` Shuah Khan
  2016-12-15 15:26                               ` Hans Verkuil
  2016-12-23 17:27                               ` Laurent Pinchart
  2016-12-16 15:07                             ` Sakari Ailus
  2 siblings, 2 replies; 89+ messages in thread
From: Shuah Khan @ 2016-12-15 14:45 UTC (permalink / raw)
  To: Hans Verkuil, Laurent Pinchart, Sakari Ailus
  Cc: Mauro Carvalho Chehab, Sakari Ailus, linux-media, Shuah Khan

On 12/15/2016 07:03 AM, Hans Verkuil wrote:
> On 15/12/16 13:56, Laurent Pinchart wrote:
>> Hi Sakari,
>>
>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>>>>> Hi Sakari,
>>>>>>
>>>>>> I answered you point to point below, but I suspect that you missed how
>>>>>> the current approach works. So, I decided to write a quick summary
>>>>>> here.
>>>>>>
>>>>>> The character devices /dev/media? are created via cdev, with relies on
>>>>>> a kobject per device, with has an embedded struct kref inside.
>>>>>>
>>>>>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>>>>>> struct device, when the code does:
>>>>>>   devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>>>>>
>>>>>> before calling cdev_add().
>>>>>>
>>>>>> The current lifetime management is actually based on cdev's kobject's
>>>>>> refcount, provided by its embedded kref.
>>>>>>
>>>>>> The kref warrants that any data associated with /dev/media0 won't be
>>>>>> freed if there are any pending system call. In other words, when
>>>>>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
>>>>>> and will call kobject_put().
>>>>>>
>>>>>> If the refcount is zero, it will call devnode->dev.release(). If the
>>>>>> kobject refcount is not zero, the data won't be freed.
>>>>>>
>>>>>> So, in the best case scenario, there's no opened file descriptors
>>>>>> by the time media device node is unregistered. So, it will free
>>>>>> everything.
>>>>>>
>>>>>> In the worse case scenario, e. g. when the driver is removed or
>>>>>> unbind while /dev/media0 has some opened file descriptor(s),
>>>>>> the cdev logic will do the proper lifetime management.
>>>>>>
>>>>>> On such case, /dev/media0 disappears from the file system, so another
>>>>>> open is not possible anymore. The data structures will remain
>>>>>> allocated until all associated file descriptors are not closed.
>>>>>>
>>>>>> When all file descriptors are closed, the data will be freed.
>>>>>>
>>>>>> On that time, it will call an optional dev.release() callback,
>>>>>> responsible to free any other data struct that the driver allocated.
>>>>>
>>>>> The patchset does not change this. It's not a question of the
>>>>> media_devnode struct either. That's not an issue.
>>>>>
>>>>> The issue is rather what else can be accessed through the media device
>>>>> and other interfaces. As IOCTLs are not serialised with device removal
>>>>> (which now releases much of the data structures)
>>>>
>>>> Huh? ioctls are serialized with struct device removal. The Driver core
>>>> warrants that.
>>>
>>> How?
>>>
>>> As far as I can tell, there's nothing in the way of an IOCTL being in
>>> progress on a character device which is registered by the driver for a
>>> hardware device which is being removed.
>>>
>>> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
>>> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
>>> are taken during that path, which I believe is by design.
> 
> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
> on release(). Thus ensuring that the cdev can never be removed while in an
> ioctl.
> 
>>>
>>>>> there's a high chance of accessing
>>>>> released memory (or mutexes that have been already destroyed). An
>>>>> example of that is here, stopping a running pipeline after unbinding
>>>>> the device. What happens there is that the media device is released
>>>>> whilst it's in use through the video device.
>>>>>
>>>>> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
>>>>
>>>> It is not clear from the logs what the driver tried to do, but
>>>> that sounds like a driver's bug, with was not prepared to properly
>>>> handle unbinds.
>>>>
>>>> The problem here is that isp_video_release() is called by V4L2
>>>> release logic, and not by the MC one:
>>>>
>>>> static const struct v4l2_file_operations isp_video_fops = {
>>>>       .owner          = THIS_MODULE,
>>>>       .open           = isp_video_open,
>>>>       .release        = isp_video_release,
>>>>       .poll           = vb2_fop_poll,
>>>>       .unlocked_ioctl = video_ioctl2,
>>>>       .mmap           = vb2_fop_mmap,
>>>> };
>>>>
>>>> It seems that the driver's logic allows it to be called before or
>>>> after destroying the MC.
>>>>
>>>> Assuming that, if the OMAP3 driver is not used it works,
>>>> it means that, if the isp_video_release() is called
>>>> first, no errors will happen, but if MC is destroyed before
>>>> V4L2 call to its .release() callback, as there's no logic at the
>>>> driver that would detect it, isp_video_release() will be calling
>>>> isp_video_streamoff(), with depends on the MC to work.
>>>>
>>>> On a first glance, I can see two ways of fixing it:
>>>>
>>>> 1) to increment devnode's device kobject refcount at OMAP3 .probe(),
>>>> decrementing it only at isp_video_release(). That will ensure that
>>>> MC will only be removed after V4L2 removal.
>>
>> As soon as you have to dig deep in a structure to find a reference counter and
>> increment it, bypassing all the API layers, you can be entirely sure that the
>> solution is wrong.
> 
> Indeed.
> 
>>
>>>> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
>>>> inside the MC .release() callback.
>>>
>>> This is a fair suggestion, indeed. Let me see what could be done there.
>>> Albeit this is just *one* of the existing issues. It will not address all
>>> problems fixed by the patchset.
>>
>> We need to stop the hardware at .remove() time. That should not be linked to a
>> videodev, v4l2_device or media_device .release() callback. When the .remove()
>> callback returns the driver is not allowed to touch the hardware anymore. In
>> particular, power domains might clocks or power supplies, leading to invalid
>> access faults if we try to access hardware registers.
> 
> Correct.
> 
>>
>> USB devices get help from the USB core that cancels all USB operations in
>> progress when they're disconnected. Platform devices don't have it as easy,
>> and need to implement everything themselves. We thus need to stop the
>> hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF ioctl at
>> .remove() time.
> 
> Please don't. This shouldn't be done automatically.
> 
>> That could introduce other races between .remove() and the
>> userspace API. A better solution is to make sure the objects that are needed
>> at .release() time of the device node are all reference-counted and only
>> released when the last reference goes away.
>>
>> There's plenty of way to try and work around the problem in drivers, some more
>> racy than others, but if we require changes to all platform drivers to fix
>> this we need to ensure that we get it right, not as half-baked hacks spread
>> around the whole subsystem.
> 
> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
> device and I see no reason whatsoever to start modifying platform drivers just
> because you can do an unbind. I know there are real hot-pluggable devices, and
> getting this right for those is of course important.

This was my first reaction when I saw this RFC series. None of the platform
drivers are designed to be unbound. Making core changes based on such as
driver would make the core very complex.

We can't even reproduce the problem on other drivers.

> 
> If the omap3 is used as a testbed, then that's fine by me, but even then I
> probably wouldn't want the omap3 code that makes this possible in the kernel.
> It's just additional code for no purpose.

I agree with Hans. Why are we using the most complex case as a reference driver
and basing that driver to make core changes which will force changes to all the
driver that use mc-core?

> 
>>>> That could be done by overwriting the dev.release() callback at
>>>> omap3 driver, as I discussed on my past e-mails, and flagging the
>>>> driver that it should not accept streamon anymore, as the hardware
>>>> is being disconnecting.
>>>
>>> A mutex will be needed to serialise the this with starting streaming.
>>>
>>>> Btw, that explains a lot why Shuah can't reproduce the stuff you're
>>>> complaining on her USB hardware.
>>>>
>>>> The USB subsystem has a a .disconnect() callback that notifies
>>>> the drivers that a device was unbound (likely physically removed).
>>>> The way USB media drivers handle it is by returning -ENODEV to any
>>>> V4L2 call that would try to touch at the hardware after unbound.
>>
> 
> In my view the main problem is that the media core is bound to a struct
> device set by the driver that creates the MC. But since the MC gives an
> overview of lots of other (sub)devices the refcount of the media device
> should be increased for any (sub)device that adds itself to the MC and
> decreased for any (sub)device that is removed. Only when the very last
> user goes away can the MC memory be released.

Correct. Media Device Allocator API work I did allows creating media device
on parent USB device to allow media sound driver share the media device. It
does ref-counting on media device and media device is unregistered only when
the last driver unregisters it.

There is another aspect to explore regarding media device and the graph.

Should all the entities stick around until all references to media
device are gone? If an application has /dev/media open, does that
mean all entities should not be free'd until this app. exits? What
should happen if an app. is streaming? Should the graph stay intact
until the app. exits?

   If yes, this would pose problems when we have multiple drivers bound
   to the media device. When audio driver goes away for example, it should
   be allowed to delete its entities.

The approach current mc-core takes is that the media_device and media_devnode
stick around, but entities can be added and removed during media_device
lifetime.

If an app. is still running when media_device is unregistered, media_device
isn't released until the last reference goes away and ioctls can check if
media_device is registered or not.

We have to decide on the larger lifetime question surrounding media_device
and graph as well.

> 
> The memory/refcounting associated with device nodes is unrelated to this:
> once a devnode is unregistered it will be removed in /dev, and once the
> last open fh closes any memory associated with the devnode can be released.
> That will also decrease the refcount to its parent device.
> 
> This also means that it is a bad idea to embed devnodes in a larger struct.
> They should be allocated and freed when the devnode is unregistered and
> the last open filehandle is closed.
> 
> Then the parent's device refcount is decreased, and that may now call its
> release callback if the refcount reaches 0.
> 
> For the media controller's device: any other device driver that needs access
> to it needs to increase that device's refcount, and only when those devices
> are released will they decrease the MC device's refcount.
> 
> And when that refcount goes to 0 can we finally free everything.
> 
> With regards to the opposition to reverting those initial patches, I'm
> siding with Greg KH. Just revert the bloody patches. It worked most of the
> time before those patches, so reverting really won't cause bisect problems.
> 
> Just revert and build up things as they should.
> 
> Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
> parent pointer for example) and many drivers (including omap3isp) embed
> video_device, which is wrong and can lead to complications.
> 
> I'm to blame for the embedding since I thought that was a good idea at one
> time. I now realized that it isn't. Sorry about that...
> 
> And because the cdev of the video_device doesn't know about the parent
> device it is (I think) possible that the parent device is released before
> the cdev is released. Which can result in major problems.
> 
> Regards,
> 
>     Hans

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 10:39                         ` Laurent Pinchart
@ 2016-12-15 14:56                           ` Shuah Khan
  2016-12-16 16:58                             ` Laurent Pinchart
  0 siblings, 1 reply; 89+ messages in thread
From: Shuah Khan @ 2016-12-15 14:56 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Mauro Carvalho Chehab, Sakari Ailus, Sakari Ailus, linux-media,
	hverkuil, shuah Khan

On 12/15/2016 03:39 AM, Laurent Pinchart wrote:
> Hello,
> 
> On Tuesday 13 Dec 2016 15:23:53 Shuah Khan wrote:
>> On 12/13/2016 05:24 AM, Mauro Carvalho Chehab wrote:
>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>>>> Hi Sakari,
>>>>>
>>>>> I answered you point to point below, but I suspect that you missed how
>>>>> the current approach works. So, I decided to write a quick summary here.
>>>>>
>>>>> The character devices /dev/media? are created via cdev, with relies on a
>>>>> kobject per device, with has an embedded struct kref inside.
>>>>>
>>>>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>>>>>
>>>>> struct device, when the code does:
>>>>> 	devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>>>>
>>>>> before calling cdev_add().
>>>>>
>>>>> The current lifetime management is actually based on cdev's kobject's
>>>>> refcount, provided by its embedded kref.
>>>>>
>>>>> The kref warrants that any data associated with /dev/media0 won't be
>>>>> freed if there are any pending system call. In other words, when
>>>>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
>>>>> and
>>>>> will call kobject_put().
>>>>>
>>>>> If the refcount is zero, it will call devnode->dev.release(). If the
>>>>> kobject refcount is not zero, the data won't be freed.
>>>>>
>>>>> So, in the best case scenario, there's no opened file descriptors
>>>>> by the time media device node is unregistered. So, it will free
>>>>> everything.
>>>>>
>>>>> In the worse case scenario, e. g. when the driver is removed or
>>>>> unbind while /dev/media0 has some opened file descriptor(s),
>>>>> the cdev logic will do the proper lifetime management.
>>>>>
>>>>> On such case, /dev/media0 disappears from the file system, so another
>>>>> open
>>>>> is not possible anymore. The data structures will remain allocated until
>>>>> all associated file descriptors are not closed.
>>>>>
>>>>> When all file descriptors are closed, the data will be freed.
>>>>>
>>>>> On that time, it will call an optional dev.release() callback,
>>>>> responsible to free any other data struct that the driver allocated.
>>>>
>>>> The patchset does not change this. It's not a question of the
>>>> media_devnode struct either. That's not an issue.
>>>>
>>>> The issue is rather what else can be accessed through the media device
>>>> and other interfaces. As IOCTLs are not serialised with device removal
>>>> (which now releases much of the data structures)
>>>
>>> Huh? ioctls are serialized with struct device removal. The Driver core
>>> warrants that.
> 
> Code references please.
>  
>>>> there's a high chance of accessing
>>>> released memory (or mutexes that have been already destroyed). An example
>>>> of that is here, stopping a running pipeline after unbinding the device.
>>>> What happens there is that the media device is released whilst it's in
>>>> use through the video device.
>>>>
>>>> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
>>>
>>> It is not clear from the logs what the driver tried to do, but
>>> that sounds like a driver's bug, with was not prepared to properly
>>> handle unbinds.
>>>
>>> The problem here is that isp_video_release() is called by V4L2
>>> release logic, and not by the MC one:
>>>
>>> static const struct v4l2_file_operations isp_video_fops = {
>>> 	.owner		= THIS_MODULE,
>>> 	.open		= isp_video_open,
>>> 	.release	= isp_video_release,
>>> 	.poll		= vb2_fop_poll,
>>> 	.unlocked_ioctl	= video_ioctl2,
>>> 	.mmap		= vb2_fop_mmap,
>>> };
>>>
>>> It seems that the driver's logic allows it to be called before or
>>> after destroying the MC.
>>
>> Right isp_video_release() will definitely be called after driver is
>> gone which means media device is gone and the device itself.
> 
> Certainly not after the driver is gone (as in the module being unloaded from 
> memory), but it can be called after the device is unbound from the driver, 
> yes.
> 
>> Both au0828 and em28xx have these release handlers. Neither one uses
>> devm resource for their device structs.
> 
> And no driver exposing objects to userspace-accessible code paths should. I've 
> been pointing at how devm_kzalloc() is abused for more than a year now, it's 
> nice to see that people slowly start listening.
> 
>> Also, both em28xx and au0828 keep disconnected state and have logic
>> to detect the state of the driver and device. em28xx holds reference
>> to v4l2->ref
> 
> That's very, very wrong. The v4l2_device::ref field must *not* be touched by 
> drivers. Acquiring and releasing references to v4l2_device instances must be 
> done with v4l2_device_get() and v4l2_device_put(), and the structure has a 
> release handler that drivers can use. Why do people write such horrible code 
> that pokes at private fields ?
> 
>> and releases the reference in em28xx_v4l2_close() which is
>> its v4l2_file_operations .release handler. It also makes sure to not
>> touch device hardware if device is disconnected.
>>
>> Also, media graph access is done only when it has a valid media_device.
>> au0828 allocates media_device struct and it gets free'd when it does
>> its unregister sequence. Subsequent calls will check if it is null.
> 
> This is very wrong too. Don't try to handle data structures being pulled from 
> under the driver's feet at random times. At best you will end up with races. 
> Data structures must instead be properly refcounted.
> 
>> It also does checks to see if media_device is registered or not in
>> some cases.
>>
>> isp_video_release() isn't safe to be called after isp device is gone,
>> leave alone media_device. Since isp is a devm resource, it is long
>> gone when device_release() release managed resources.
>>
>> I agree with Mauro that this is a driver problem.
> 
> No. There *is* a driver problem caused by devm_kzalloc() usage, and that 
> problem *must* be fixed, but the media device life time management is also 
> completely broken in core code.
> 
>> Mauro and I did lot of work to get the USB drivers (em28xx and au0828) to
>> handle disconnect and unbind cases even before the media controller support
>> was added to them.
>>
>> I think what needs to happen is:
>>
>> 1. Remove devm use from omap3
> 
> Absolutely.
> 
>> 2. Make sure media graph isn't accessed after media_device is unregistered
> 
> No way. We need to access the graph from the release handlers of the 
> userspace-exposed structures (videodev and possibly media_device). The 
> media_device structure must *not* disappear at unregistration time.
> 
>> 3. Take reference to v4l2 device to be able to make sanity checks from
>>    isp_video_release() to determine if media_device is still around and
>>    then do stop stream etc. It has to keep state.
>>
>> I agree with Mauro that this is a driver problem. Mauro and I did lot
>> of work to get the USB drivers (em28xx and au0828) to handle disconnect
>> and unbind cases even before the media controller support was added to
>> them.
>>
>> Please don't pursue this RFC series that makes mc-core changes until
>> ompa3 driver problems are addressed. There is no need to change the
>> core unless it is necessary.
> 
> It is necessary as has been explained countless times, and will become more 
> and more necessary as media_device instances get shared between multiple 
> drivers, which is currently attempted *without* reference counting.

You are probably forgetting the Media Device Allocator API work I did
to make media_device sharable across media and audio drivers. Sakari's
patches don't address the sharable need. I have been asking Sakari to
use Media Device Allocator API in his patch series for allocating media
device.

I discussed the conflicts between the work I am doing and Sakari's series
to find a common ground. But it doesn't look like we are going to get there.

thanks,
-- Shuah

> 
>> I would be happy to help, unfortunately I don't have a omap3 device
>> to fix and test problems. I am unable to find any omap3 devices. The
>> one I have isn't good.
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 14:45                             ` Shuah Khan
@ 2016-12-15 15:26                               ` Hans Verkuil
  2016-12-15 16:06                                 ` Shuah Khan
                                                   ` (2 more replies)
  2016-12-23 17:27                               ` Laurent Pinchart
  1 sibling, 3 replies; 89+ messages in thread
From: Hans Verkuil @ 2016-12-15 15:26 UTC (permalink / raw)
  To: Shuah Khan, Laurent Pinchart, Sakari Ailus
  Cc: Mauro Carvalho Chehab, Sakari Ailus, linux-media

On 15/12/16 15:45, Shuah Khan wrote:
> On 12/15/2016 07:03 AM, Hans Verkuil wrote:
>> On 15/12/16 13:56, Laurent Pinchart wrote:
>>> Hi Sakari,
>>>
>>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>> Hi Sakari,
>>>>>>>
>>>>>>> I answered you point to point below, but I suspect that you missed how
>>>>>>> the current approach works. So, I decided to write a quick summary
>>>>>>> here.
>>>>>>>
>>>>>>> The character devices /dev/media? are created via cdev, with relies on
>>>>>>> a kobject per device, with has an embedded struct kref inside.
>>>>>>>
>>>>>>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>>>>>>> struct device, when the code does:
>>>>>>>   devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>>>>>>
>>>>>>> before calling cdev_add().
>>>>>>>
>>>>>>> The current lifetime management is actually based on cdev's kobject's
>>>>>>> refcount, provided by its embedded kref.
>>>>>>>
>>>>>>> The kref warrants that any data associated with /dev/media0 won't be
>>>>>>> freed if there are any pending system call. In other words, when
>>>>>>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
>>>>>>> and will call kobject_put().
>>>>>>>
>>>>>>> If the refcount is zero, it will call devnode->dev.release(). If the
>>>>>>> kobject refcount is not zero, the data won't be freed.
>>>>>>>
>>>>>>> So, in the best case scenario, there's no opened file descriptors
>>>>>>> by the time media device node is unregistered. So, it will free
>>>>>>> everything.
>>>>>>>
>>>>>>> In the worse case scenario, e. g. when the driver is removed or
>>>>>>> unbind while /dev/media0 has some opened file descriptor(s),
>>>>>>> the cdev logic will do the proper lifetime management.
>>>>>>>
>>>>>>> On such case, /dev/media0 disappears from the file system, so another
>>>>>>> open is not possible anymore. The data structures will remain
>>>>>>> allocated until all associated file descriptors are not closed.
>>>>>>>
>>>>>>> When all file descriptors are closed, the data will be freed.
>>>>>>>
>>>>>>> On that time, it will call an optional dev.release() callback,
>>>>>>> responsible to free any other data struct that the driver allocated.
>>>>>>
>>>>>> The patchset does not change this. It's not a question of the
>>>>>> media_devnode struct either. That's not an issue.
>>>>>>
>>>>>> The issue is rather what else can be accessed through the media device
>>>>>> and other interfaces. As IOCTLs are not serialised with device removal
>>>>>> (which now releases much of the data structures)
>>>>>
>>>>> Huh? ioctls are serialized with struct device removal. The Driver core
>>>>> warrants that.
>>>>
>>>> How?
>>>>
>>>> As far as I can tell, there's nothing in the way of an IOCTL being in
>>>> progress on a character device which is registered by the driver for a
>>>> hardware device which is being removed.
>>>>
>>>> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
>>>> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
>>>> are taken during that path, which I believe is by design.
>>
>> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
>> on release(). Thus ensuring that the cdev can never be removed while in an
>> ioctl.
>>
>>>>
>>>>>> there's a high chance of accessing
>>>>>> released memory (or mutexes that have been already destroyed). An
>>>>>> example of that is here, stopping a running pipeline after unbinding
>>>>>> the device. What happens there is that the media device is released
>>>>>> whilst it's in use through the video device.
>>>>>>
>>>>>> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
>>>>>
>>>>> It is not clear from the logs what the driver tried to do, but
>>>>> that sounds like a driver's bug, with was not prepared to properly
>>>>> handle unbinds.
>>>>>
>>>>> The problem here is that isp_video_release() is called by V4L2
>>>>> release logic, and not by the MC one:
>>>>>
>>>>> static const struct v4l2_file_operations isp_video_fops = {
>>>>>       .owner          = THIS_MODULE,
>>>>>       .open           = isp_video_open,
>>>>>       .release        = isp_video_release,
>>>>>       .poll           = vb2_fop_poll,
>>>>>       .unlocked_ioctl = video_ioctl2,
>>>>>       .mmap           = vb2_fop_mmap,
>>>>> };
>>>>>
>>>>> It seems that the driver's logic allows it to be called before or
>>>>> after destroying the MC.
>>>>>
>>>>> Assuming that, if the OMAP3 driver is not used it works,
>>>>> it means that, if the isp_video_release() is called
>>>>> first, no errors will happen, but if MC is destroyed before
>>>>> V4L2 call to its .release() callback, as there's no logic at the
>>>>> driver that would detect it, isp_video_release() will be calling
>>>>> isp_video_streamoff(), with depends on the MC to work.
>>>>>
>>>>> On a first glance, I can see two ways of fixing it:
>>>>>
>>>>> 1) to increment devnode's device kobject refcount at OMAP3 .probe(),
>>>>> decrementing it only at isp_video_release(). That will ensure that
>>>>> MC will only be removed after V4L2 removal.
>>>
>>> As soon as you have to dig deep in a structure to find a reference counter and
>>> increment it, bypassing all the API layers, you can be entirely sure that the
>>> solution is wrong.
>>
>> Indeed.
>>
>>>
>>>>> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
>>>>> inside the MC .release() callback.
>>>>
>>>> This is a fair suggestion, indeed. Let me see what could be done there.
>>>> Albeit this is just *one* of the existing issues. It will not address all
>>>> problems fixed by the patchset.
>>>
>>> We need to stop the hardware at .remove() time. That should not be linked to a
>>> videodev, v4l2_device or media_device .release() callback. When the .remove()
>>> callback returns the driver is not allowed to touch the hardware anymore. In
>>> particular, power domains might clocks or power supplies, leading to invalid
>>> access faults if we try to access hardware registers.
>>
>> Correct.
>>
>>>
>>> USB devices get help from the USB core that cancels all USB operations in
>>> progress when they're disconnected. Platform devices don't have it as easy,
>>> and need to implement everything themselves. We thus need to stop the
>>> hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF ioctl at
>>> .remove() time.
>>
>> Please don't. This shouldn't be done automatically.
>>
>>> That could introduce other races between .remove() and the
>>> userspace API. A better solution is to make sure the objects that are needed
>>> at .release() time of the device node are all reference-counted and only
>>> released when the last reference goes away.
>>>
>>> There's plenty of way to try and work around the problem in drivers, some more
>>> racy than others, but if we require changes to all platform drivers to fix
>>> this we need to ensure that we get it right, not as half-baked hacks spread
>>> around the whole subsystem.
>>
>> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
>> device and I see no reason whatsoever to start modifying platform drivers just
>> because you can do an unbind. I know there are real hot-pluggable devices, and
>> getting this right for those is of course important.
>
> This was my first reaction when I saw this RFC series. None of the platform
> drivers are designed to be unbound. Making core changes based on such as
> driver would make the core very complex.
>
> We can't even reproduce the problem on other drivers.
>
>>
>> If the omap3 is used as a testbed, then that's fine by me, but even then I
>> probably wouldn't want the omap3 code that makes this possible in the kernel.
>> It's just additional code for no purpose.
>
> I agree with Hans. Why are we using the most complex case as a reference driver
> and basing that driver to make core changes which will force changes to all the
> driver that use mc-core?
>
>>
>>>>> That could be done by overwriting the dev.release() callback at
>>>>> omap3 driver, as I discussed on my past e-mails, and flagging the
>>>>> driver that it should not accept streamon anymore, as the hardware
>>>>> is being disconnecting.
>>>>
>>>> A mutex will be needed to serialise the this with starting streaming.
>>>>
>>>>> Btw, that explains a lot why Shuah can't reproduce the stuff you're
>>>>> complaining on her USB hardware.
>>>>>
>>>>> The USB subsystem has a a .disconnect() callback that notifies
>>>>> the drivers that a device was unbound (likely physically removed).
>>>>> The way USB media drivers handle it is by returning -ENODEV to any
>>>>> V4L2 call that would try to touch at the hardware after unbound.
>>>
>>
>> In my view the main problem is that the media core is bound to a struct
>> device set by the driver that creates the MC. But since the MC gives an
>> overview of lots of other (sub)devices the refcount of the media device
>> should be increased for any (sub)device that adds itself to the MC and
>> decreased for any (sub)device that is removed. Only when the very last
>> user goes away can the MC memory be released.
>
> Correct. Media Device Allocator API work I did allows creating media device
> on parent USB device to allow media sound driver share the media device. It
> does ref-counting on media device and media device is unregistered only when
> the last driver unregisters it.
>
> There is another aspect to explore regarding media device and the graph.
>
> Should all the entities stick around until all references to media
> device are gone? If an application has /dev/media open, does that
> mean all entities should not be free'd until this app. exits? What
> should happen if an app. is streaming? Should the graph stay intact
> until the app. exits?

Yes, everything must stay around until the last user has disappeared.

In general unplugs can happen at any time. So applications can be in the middle
of an ioctl, and removing memory during that time is just impossible.

On unplug you:

1) stop any HW DMA (highly device dependent)
2) wake up any filehandles that wait for an event
3) unregister any device nodes

Then just sit back and wait for refcounts to go down as filehandles are closed
by the application.

Note: the v4l2/media/cec/IR/whatever core is typically responsible for rejecting
any ioctls/mmap/etc. once the device node has been unregistered. The only valid
file operation is release().

>
>    If yes, this would pose problems when we have multiple drivers bound
>    to the media device. When audio driver goes away for example, it should
>    be allowed to delete its entities.

Only if you can safely remove it from the topology data structures while
being 100% certain that nobody can ever access it. I'm not sure if that is
the case. Actually, looking at e.g. adv7604.c it does media_entity_cleanup(&sd->entity);
in remove() which is an empty function, so there doesn't appear any attempt
to safely clean up an entity (i.e. make sure no running media ioctl can
access it or call ops).

This probably will need to be serialized with the graph_mutex lock.

>
> The approach current mc-core takes is that the media_device and media_devnode
> stick around, but entities can be added and removed during media_device
> lifetime.

Seems reasonable. But the removal needs to be done carefully, and that doesn't
seem to be the case now (unless adv7604.c is just buggy).

>
> If an app. is still running when media_device is unregistered, media_device
> isn't released until the last reference goes away and ioctls can check if
> media_device is registered or not.
>
> We have to decide on the larger lifetime question surrounding media_device
> and graph as well.

I don't think there is any choice but to keep it all alive until the last
reference goes away.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 14:45                               ` Hans Verkuil
@ 2016-12-15 15:45                                 ` Mauro Carvalho Chehab
  2016-12-15 16:07                                   ` Hans Verkuil
  2016-12-16 16:47                                   ` Laurent Pinchart
  0 siblings, 2 replies; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-15 15:45 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Laurent Pinchart, Sakari Ailus, Shuah Khan, Sakari Ailus, linux-media

Em Thu, 15 Dec 2016 15:45:22 +0100
Hans Verkuil <hverkuil@xs4all.nl> escreveu:

> On 15/12/16 15:32, Mauro Carvalho Chehab wrote:
> > Em Thu, 15 Dec 2016 15:03:36 +0100
> > Hans Verkuil <hverkuil@xs4all.nl> escreveu:
> >  
> >> On 15/12/16 13:56, Laurent Pinchart wrote:  
> >>> Hi Sakari,
> >>>
> >>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:  
> >>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:  
> >>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:  
> >>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:  
> >>>>>>> Hi Sakari,
> >>>>>>>  
> >
> >  
> >>> There's plenty of way to try and work around the problem in drivers, some more
> >>> racy than others, but if we require changes to all platform drivers to fix
> >>> this we need to ensure that we get it right, not as half-baked hacks spread
> >>> around the whole subsystem.  
> >>
> >> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
> >> device and I see no reason whatsoever to start modifying platform drivers just
> >> because you can do an unbind. I know there are real hot-pluggable devices, and
> >> getting this right for those is of course important.  
> >
> > That's indeed a very good point. If unbind is not needed by any usecase,
> > the better fix for OMAP3 would be to just prevent it to happen in the first
> > place.
> >  
> >>>>> The USB subsystem has a a .disconnect() callback that notifies
> >>>>> the drivers that a device was unbound (likely physically removed).
> >>>>> The way USB media drivers handle it is by returning -ENODEV to any
> >>>>> V4L2 call that would try to touch at the hardware after unbound.  
> >>>  
> >>
> >> In my view the main problem is that the media core is bound to a struct
> >> device set by the driver that creates the MC. But since the MC gives an
> >> overview of lots of other (sub)devices the refcount of the media device
> >> should be increased for any (sub)device that adds itself to the MC and
> >> decreased for any (sub)device that is removed. Only when the very last
> >> user goes away can the MC memory be released.
> >>
> >> The memory/refcounting associated with device nodes is unrelated to this:
> >> once a devnode is unregistered it will be removed in /dev, and once the
> >> last open fh closes any memory associated with the devnode can be released.
> >> That will also decrease the refcount to its parent device.
> >>
> >> This also means that it is a bad idea to embed devnodes in a larger struct.
> >> They should be allocated and freed when the devnode is unregistered and
> >> the last open filehandle is closed.
> >>
> >> Then the parent's device refcount is decreased, and that may now call its
> >> release callback if the refcount reaches 0.
> >>
> >> For the media controller's device: any other device driver that needs access
> >> to it needs to increase that device's refcount, and only when those devices
> >> are released will they decrease the MC device's refcount.
> >>
> >> And when that refcount goes to 0 can we finally free everything.
> >>
> >> With regards to the opposition to reverting those initial patches, I'm
> >> siding with Greg KH. Just revert the bloody patches. It worked most of the
> >> time before those patches, so reverting really won't cause bisect problems.  
> >
> > You're contradicting yourself here ;)
> >
> > The patches that this patch series is reverting are the ones that
> > de-embeeds devnode struct and fixes its lifecycle.
> >
> > Reverting those patches will cause regressions on hot-pluggable drivers,
> > preventing them to be unplugged. So, if we're willing to revert, then we
> > should also revert MC support on them.  
> 
> Two options:
> 
> 1) Revert, then build up a proper solution.

Reverting is a regression, as we'll strip off the MC support from the
existing devices. We would also need to revert a lot more than just those
3 patches.

> 2) Do a big-bang patch switching directly over to the new solution, but that's
> very hard to review.
> 2a) Post the patch series in small chunks on the mailinglist (starting with the
> reverts), but once we're all happy merge that patch series into a single big-bang
> patch and apply that.

We could do that, but so far, what has been submitted are incomplete,
as they only touch on a single driver (with doesn't require hot-plugging),
breaking all the other ones.

> As far as I am concerned the whole hotplugging code is broken and has been for
> a very long time. We (or at least I :-) ) understand the underlying concepts
> a lot better, so we can do a better job. But the transition may well be
> painful.

It is not broken currently on the devices that require hotplugging.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 15:26                               ` Hans Verkuil
@ 2016-12-15 16:06                                 ` Shuah Khan
  2016-12-15 16:28                                   ` Hans Verkuil
  2016-12-23 18:13                                   ` Laurent Pinchart
  2016-12-15 17:08                                 ` Mauro Carvalho Chehab
  2016-12-23 17:48                                 ` Laurent Pinchart
  2 siblings, 2 replies; 89+ messages in thread
From: Shuah Khan @ 2016-12-15 16:06 UTC (permalink / raw)
  To: Hans Verkuil, Laurent Pinchart, Sakari Ailus
  Cc: Mauro Carvalho Chehab, Sakari Ailus, linux-media, Shuah Khan

On 12/15/2016 08:26 AM, Hans Verkuil wrote:
> On 15/12/16 15:45, Shuah Khan wrote:
>> On 12/15/2016 07:03 AM, Hans Verkuil wrote:
>>> On 15/12/16 13:56, Laurent Pinchart wrote:
>>>> Hi Sakari,
>>>>
>>>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>>>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>>>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>>> Hi Sakari,
>>>>>>>>
>>>>>>>> I answered you point to point below, but I suspect that you missed how
>>>>>>>> the current approach works. So, I decided to write a quick summary
>>>>>>>> here.
>>>>>>>>
>>>>>>>> The character devices /dev/media? are created via cdev, with relies on
>>>>>>>> a kobject per device, with has an embedded struct kref inside.
>>>>>>>>
>>>>>>>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>>>>>>>> struct device, when the code does:
>>>>>>>>   devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>>>>>>>
>>>>>>>> before calling cdev_add().
>>>>>>>>
>>>>>>>> The current lifetime management is actually based on cdev's kobject's
>>>>>>>> refcount, provided by its embedded kref.
>>>>>>>>
>>>>>>>> The kref warrants that any data associated with /dev/media0 won't be
>>>>>>>> freed if there are any pending system call. In other words, when
>>>>>>>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
>>>>>>>> and will call kobject_put().
>>>>>>>>
>>>>>>>> If the refcount is zero, it will call devnode->dev.release(). If the
>>>>>>>> kobject refcount is not zero, the data won't be freed.
>>>>>>>>
>>>>>>>> So, in the best case scenario, there's no opened file descriptors
>>>>>>>> by the time media device node is unregistered. So, it will free
>>>>>>>> everything.
>>>>>>>>
>>>>>>>> In the worse case scenario, e. g. when the driver is removed or
>>>>>>>> unbind while /dev/media0 has some opened file descriptor(s),
>>>>>>>> the cdev logic will do the proper lifetime management.
>>>>>>>>
>>>>>>>> On such case, /dev/media0 disappears from the file system, so another
>>>>>>>> open is not possible anymore. The data structures will remain
>>>>>>>> allocated until all associated file descriptors are not closed.
>>>>>>>>
>>>>>>>> When all file descriptors are closed, the data will be freed.
>>>>>>>>
>>>>>>>> On that time, it will call an optional dev.release() callback,
>>>>>>>> responsible to free any other data struct that the driver allocated.
>>>>>>>
>>>>>>> The patchset does not change this. It's not a question of the
>>>>>>> media_devnode struct either. That's not an issue.
>>>>>>>
>>>>>>> The issue is rather what else can be accessed through the media device
>>>>>>> and other interfaces. As IOCTLs are not serialised with device removal
>>>>>>> (which now releases much of the data structures)
>>>>>>
>>>>>> Huh? ioctls are serialized with struct device removal. The Driver core
>>>>>> warrants that.
>>>>>
>>>>> How?
>>>>>
>>>>> As far as I can tell, there's nothing in the way of an IOCTL being in
>>>>> progress on a character device which is registered by the driver for a
>>>>> hardware device which is being removed.
>>>>>
>>>>> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
>>>>> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
>>>>> are taken during that path, which I believe is by design.
>>>
>>> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
>>> on release(). Thus ensuring that the cdev can never be removed while in an
>>> ioctl.
>>>
>>>>>
>>>>>>> there's a high chance of accessing
>>>>>>> released memory (or mutexes that have been already destroyed). An
>>>>>>> example of that is here, stopping a running pipeline after unbinding
>>>>>>> the device. What happens there is that the media device is released
>>>>>>> whilst it's in use through the video device.
>>>>>>>
>>>>>>> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
>>>>>>
>>>>>> It is not clear from the logs what the driver tried to do, but
>>>>>> that sounds like a driver's bug, with was not prepared to properly
>>>>>> handle unbinds.
>>>>>>
>>>>>> The problem here is that isp_video_release() is called by V4L2
>>>>>> release logic, and not by the MC one:
>>>>>>
>>>>>> static const struct v4l2_file_operations isp_video_fops = {
>>>>>>       .owner          = THIS_MODULE,
>>>>>>       .open           = isp_video_open,
>>>>>>       .release        = isp_video_release,
>>>>>>       .poll           = vb2_fop_poll,
>>>>>>       .unlocked_ioctl = video_ioctl2,
>>>>>>       .mmap           = vb2_fop_mmap,
>>>>>> };
>>>>>>
>>>>>> It seems that the driver's logic allows it to be called before or
>>>>>> after destroying the MC.
>>>>>>
>>>>>> Assuming that, if the OMAP3 driver is not used it works,
>>>>>> it means that, if the isp_video_release() is called
>>>>>> first, no errors will happen, but if MC is destroyed before
>>>>>> V4L2 call to its .release() callback, as there's no logic at the
>>>>>> driver that would detect it, isp_video_release() will be calling
>>>>>> isp_video_streamoff(), with depends on the MC to work.
>>>>>>
>>>>>> On a first glance, I can see two ways of fixing it:
>>>>>>
>>>>>> 1) to increment devnode's device kobject refcount at OMAP3 .probe(),
>>>>>> decrementing it only at isp_video_release(). That will ensure that
>>>>>> MC will only be removed after V4L2 removal.
>>>>
>>>> As soon as you have to dig deep in a structure to find a reference counter and
>>>> increment it, bypassing all the API layers, you can be entirely sure that the
>>>> solution is wrong.
>>>
>>> Indeed.
>>>
>>>>
>>>>>> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
>>>>>> inside the MC .release() callback.
>>>>>
>>>>> This is a fair suggestion, indeed. Let me see what could be done there.
>>>>> Albeit this is just *one* of the existing issues. It will not address all
>>>>> problems fixed by the patchset.
>>>>
>>>> We need to stop the hardware at .remove() time. That should not be linked to a
>>>> videodev, v4l2_device or media_device .release() callback. When the .remove()
>>>> callback returns the driver is not allowed to touch the hardware anymore. In
>>>> particular, power domains might clocks or power supplies, leading to invalid
>>>> access faults if we try to access hardware registers.
>>>
>>> Correct.
>>>
>>>>
>>>> USB devices get help from the USB core that cancels all USB operations in
>>>> progress when they're disconnected. Platform devices don't have it as easy,
>>>> and need to implement everything themselves. We thus need to stop the
>>>> hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF ioctl at
>>>> .remove() time.
>>>
>>> Please don't. This shouldn't be done automatically.
>>>
>>>> That could introduce other races between .remove() and the
>>>> userspace API. A better solution is to make sure the objects that are needed
>>>> at .release() time of the device node are all reference-counted and only
>>>> released when the last reference goes away.
>>>>
>>>> There's plenty of way to try and work around the problem in drivers, some more
>>>> racy than others, but if we require changes to all platform drivers to fix
>>>> this we need to ensure that we get it right, not as half-baked hacks spread
>>>> around the whole subsystem.
>>>
>>> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
>>> device and I see no reason whatsoever to start modifying platform drivers just
>>> because you can do an unbind. I know there are real hot-pluggable devices, and
>>> getting this right for those is of course important.
>>
>> This was my first reaction when I saw this RFC series. None of the platform
>> drivers are designed to be unbound. Making core changes based on such as
>> driver would make the core very complex.
>>
>> We can't even reproduce the problem on other drivers.
>>
>>>
>>> If the omap3 is used as a testbed, then that's fine by me, but even then I
>>> probably wouldn't want the omap3 code that makes this possible in the kernel.
>>> It's just additional code for no purpose.
>>
>> I agree with Hans. Why are we using the most complex case as a reference driver
>> and basing that driver to make core changes which will force changes to all the
>> driver that use mc-core?
>>
>>>
>>>>>> That could be done by overwriting the dev.release() callback at
>>>>>> omap3 driver, as I discussed on my past e-mails, and flagging the
>>>>>> driver that it should not accept streamon anymore, as the hardware
>>>>>> is being disconnecting.
>>>>>
>>>>> A mutex will be needed to serialise the this with starting streaming.
>>>>>
>>>>>> Btw, that explains a lot why Shuah can't reproduce the stuff you're
>>>>>> complaining on her USB hardware.
>>>>>>
>>>>>> The USB subsystem has a a .disconnect() callback that notifies
>>>>>> the drivers that a device was unbound (likely physically removed).
>>>>>> The way USB media drivers handle it is by returning -ENODEV to any
>>>>>> V4L2 call that would try to touch at the hardware after unbound.
>>>>
>>>
>>> In my view the main problem is that the media core is bound to a struct
>>> device set by the driver that creates the MC. But since the MC gives an
>>> overview of lots of other (sub)devices the refcount of the media device
>>> should be increased for any (sub)device that adds itself to the MC and
>>> decreased for any (sub)device that is removed. Only when the very last
>>> user goes away can the MC memory be released.
>>
>> Correct. Media Device Allocator API work I did allows creating media device
>> on parent USB device to allow media sound driver share the media device. It
>> does ref-counting on media device and media device is unregistered only when
>> the last driver unregisters it.
>>
>> There is another aspect to explore regarding media device and the graph.
>>
>> Should all the entities stick around until all references to media
>> device are gone? If an application has /dev/media open, does that
>> mean all entities should not be free'd until this app. exits? What
>> should happen if an app. is streaming? Should the graph stay intact
>> until the app. exits?
> 
> Yes, everything must stay around until the last user has disappeared.
> 
> In general unplugs can happen at any time. So applications can be in the middle
> of an ioctl, and removing memory during that time is just impossible.
> 
> On unplug you:
> 
> 1) stop any HW DMA (highly device dependent)
> 2) wake up any filehandles that wait for an event
> 3) unregister any device nodes
> 
> Then just sit back and wait for refcounts to go down as filehandles are closed
> by the application.
> 
> Note: the v4l2/media/cec/IR/whatever core is typically responsible for rejecting
> any ioctls/mmap/etc. once the device node has been unregistered. The only valid
> file operation is release().
> 
>>
>>    If yes, this would pose problems when we have multiple drivers bound
>>    to the media device. When audio driver goes away for example, it should
>>    be allowed to delete its entities.
> 
> Only if you can safely remove it from the topology data structures while
> being 100% certain that nobody can ever access it. I'm not sure if that is
> the case. Actually, looking at e.g. adv7604.c it does media_entity_cleanup(&sd->entity);
> in remove() which is an empty function, so there doesn't appear any attempt
> to safely clean up an entity (i.e. make sure no running media ioctl can
> access it or call ops).

Right. media_entity_cleanup() nothing at the moment. Also if it gets called
after media_device_unregister_entity(), it could pose problems. I wonder if
we have drivers that are calling media_entity_cleanup() after unregistering
the entity?

> 
> This probably will need to be serialized with the graph_mutex lock.
> 
>>
>> The approach current mc-core takes is that the media_device and media_devnode
>> stick around, but entities can be added and removed during media_device
>> lifetime.
> 
> Seems reasonable. But the removal needs to be done carefully, and that doesn't
> seem to be the case now (unless adv7604.c is just buggy).

Correct. It is possible media_device is embedded in this driver. When driver
that embeds is unbound, media_device goes away. I needed to make the media
device refcounted and sharable for audio work and that is what the Media Device
Allocator API does.

Maybe we have more cases than this audio case that requires media_device refcounted.
If we have to keep entities that are in use until all the references go away, we
have to ref-count them as well.

> 
>>
>> If an app. is still running when media_device is unregistered, media_device
>> isn't released until the last reference goes away and ioctls can check if
>> media_device is registered or not.
>>
>> We have to decide on the larger lifetime question surrounding media_device
>> and graph as well.
> 
> I don't think there is any choice but to keep it all alive until the last
> reference goes away.

If you mean "all alive" entities as well, we have to ref-count them. Because
drivers can unregister entities during run-time now. I am looking at the
use-case where, a driver that has dvb and video and what should happen when
dvb is unbound for example. Should dvb entities go away or should they stay
until all the drivers are unbound?

v4l2-core registers and unregisters entities and so does dvb-core. So when a
driver unregisters video and dvb, these entities get deleted. So we have a
distributed mode of registering and unregistering entities. We also have
ioctls (video, dvb, and media) accessing these entities. So where do we make
changes to ensure entities stick around until all users exit?

Ref-counting entities won't work if they are embedded - like in the case of
struct video_device which embeds the media entity. When struct video goes
away then entity will disappear. So we do have a complex lifetime model here
that we need to figure out how to fix.

> 
> Regards,
> 
>     Hans

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 15:45                                 ` Mauro Carvalho Chehab
@ 2016-12-15 16:07                                   ` Hans Verkuil
  2016-12-16 16:47                                   ` Laurent Pinchart
  1 sibling, 0 replies; 89+ messages in thread
From: Hans Verkuil @ 2016-12-15 16:07 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Laurent Pinchart, Sakari Ailus, Shuah Khan, Sakari Ailus, linux-media

On 15/12/16 16:45, Mauro Carvalho Chehab wrote:
> Em Thu, 15 Dec 2016 15:45:22 +0100
> Hans Verkuil <hverkuil@xs4all.nl> escreveu:
>
>> On 15/12/16 15:32, Mauro Carvalho Chehab wrote:
>>> Em Thu, 15 Dec 2016 15:03:36 +0100
>>> Hans Verkuil <hverkuil@xs4all.nl> escreveu:
>>>
>>>> On 15/12/16 13:56, Laurent Pinchart wrote:
>>>>> Hi Sakari,
>>>>>
>>>>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>>>>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>>>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>>>> Hi Sakari,
>>>>>>>>>
>>>
>>>
>>>>> There's plenty of way to try and work around the problem in drivers, some more
>>>>> racy than others, but if we require changes to all platform drivers to fix
>>>>> this we need to ensure that we get it right, not as half-baked hacks spread
>>>>> around the whole subsystem.
>>>>
>>>> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
>>>> device and I see no reason whatsoever to start modifying platform drivers just
>>>> because you can do an unbind. I know there are real hot-pluggable devices, and
>>>> getting this right for those is of course important.
>>>
>>> That's indeed a very good point. If unbind is not needed by any usecase,
>>> the better fix for OMAP3 would be to just prevent it to happen in the first
>>> place.
>>>
>>>>>>> The USB subsystem has a a .disconnect() callback that notifies
>>>>>>> the drivers that a device was unbound (likely physically removed).
>>>>>>> The way USB media drivers handle it is by returning -ENODEV to any
>>>>>>> V4L2 call that would try to touch at the hardware after unbound.
>>>>>
>>>>
>>>> In my view the main problem is that the media core is bound to a struct
>>>> device set by the driver that creates the MC. But since the MC gives an
>>>> overview of lots of other (sub)devices the refcount of the media device
>>>> should be increased for any (sub)device that adds itself to the MC and
>>>> decreased for any (sub)device that is removed. Only when the very last
>>>> user goes away can the MC memory be released.
>>>>
>>>> The memory/refcounting associated with device nodes is unrelated to this:
>>>> once a devnode is unregistered it will be removed in /dev, and once the
>>>> last open fh closes any memory associated with the devnode can be released.
>>>> That will also decrease the refcount to its parent device.
>>>>
>>>> This also means that it is a bad idea to embed devnodes in a larger struct.
>>>> They should be allocated and freed when the devnode is unregistered and
>>>> the last open filehandle is closed.
>>>>
>>>> Then the parent's device refcount is decreased, and that may now call its
>>>> release callback if the refcount reaches 0.
>>>>
>>>> For the media controller's device: any other device driver that needs access
>>>> to it needs to increase that device's refcount, and only when those devices
>>>> are released will they decrease the MC device's refcount.
>>>>
>>>> And when that refcount goes to 0 can we finally free everything.
>>>>
>>>> With regards to the opposition to reverting those initial patches, I'm
>>>> siding with Greg KH. Just revert the bloody patches. It worked most of the
>>>> time before those patches, so reverting really won't cause bisect problems.
>>>
>>> You're contradicting yourself here ;)
>>>
>>> The patches that this patch series is reverting are the ones that
>>> de-embeeds devnode struct and fixes its lifecycle.
>>>
>>> Reverting those patches will cause regressions on hot-pluggable drivers,
>>> preventing them to be unplugged. So, if we're willing to revert, then we
>>> should also revert MC support on them.
>>
>> Two options:
>>
>> 1) Revert, then build up a proper solution.
>
> Reverting is a regression, as we'll strip off the MC support from the
> existing devices. We would also need to revert a lot more than just those
> 3 patches.
>
>> 2) Do a big-bang patch switching directly over to the new solution, but that's
>> very hard to review.
>> 2a) Post the patch series in small chunks on the mailinglist (starting with the
>> reverts), but once we're all happy merge that patch series into a single big-bang
>> patch and apply that.
>
> We could do that, but so far, what has been submitted are incomplete,
> as they only touch on a single driver (with doesn't require hot-plugging),
> breaking all the other ones.

Step 1 is to find a solution that works and isn't a hack or workaround.

The next step is to roll it out for all drivers (ideally with an absolute minimum of
required changes), and the final step is to figure out how to organize the patch
series to ensure bisect, etc.

But the first step is the most important, and one that should be reviewed
even if the way the patch series is organized might/will cause problems when
it is merged.

This whole discussion about those revert patches is simply not relevant at the
moment.

Regards,

	Hans

>
>> As far as I am concerned the whole hotplugging code is broken and has been for
>> a very long time. We (or at least I :-) ) understand the underlying concepts
>> a lot better, so we can do a better job. But the transition may well be
>> painful.
>
> It is not broken currently on the devices that require hotplugging.
>
> Thanks,
> Mauro
>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 16:06                                 ` Shuah Khan
@ 2016-12-15 16:28                                   ` Hans Verkuil
  2016-12-15 17:09                                     ` Shuah Khan
  2016-12-23 18:13                                   ` Laurent Pinchart
  1 sibling, 1 reply; 89+ messages in thread
From: Hans Verkuil @ 2016-12-15 16:28 UTC (permalink / raw)
  To: Shuah Khan, Laurent Pinchart, Sakari Ailus
  Cc: Mauro Carvalho Chehab, Sakari Ailus, linux-media

On 15/12/16 17:06, Shuah Khan wrote:
> On 12/15/2016 08:26 AM, Hans Verkuil wrote:
>> On 15/12/16 15:45, Shuah Khan wrote:
>>> On 12/15/2016 07:03 AM, Hans Verkuil wrote:
>>>> On 15/12/16 13:56, Laurent Pinchart wrote:
>>>>> Hi Sakari,
>>>>>
>>>>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>>>>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>>>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>>>> Hi Sakari,
>>>>>>>>>
>>>>>>>>> I answered you point to point below, but I suspect that you missed how
>>>>>>>>> the current approach works. So, I decided to write a quick summary
>>>>>>>>> here.
>>>>>>>>>
>>>>>>>>> The character devices /dev/media? are created via cdev, with relies on
>>>>>>>>> a kobject per device, with has an embedded struct kref inside.
>>>>>>>>>
>>>>>>>>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>>>>>>>>> struct device, when the code does:
>>>>>>>>>   devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>>>>>>>>
>>>>>>>>> before calling cdev_add().
>>>>>>>>>
>>>>>>>>> The current lifetime management is actually based on cdev's kobject's
>>>>>>>>> refcount, provided by its embedded kref.
>>>>>>>>>
>>>>>>>>> The kref warrants that any data associated with /dev/media0 won't be
>>>>>>>>> freed if there are any pending system call. In other words, when
>>>>>>>>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
>>>>>>>>> and will call kobject_put().
>>>>>>>>>
>>>>>>>>> If the refcount is zero, it will call devnode->dev.release(). If the
>>>>>>>>> kobject refcount is not zero, the data won't be freed.
>>>>>>>>>
>>>>>>>>> So, in the best case scenario, there's no opened file descriptors
>>>>>>>>> by the time media device node is unregistered. So, it will free
>>>>>>>>> everything.
>>>>>>>>>
>>>>>>>>> In the worse case scenario, e. g. when the driver is removed or
>>>>>>>>> unbind while /dev/media0 has some opened file descriptor(s),
>>>>>>>>> the cdev logic will do the proper lifetime management.
>>>>>>>>>
>>>>>>>>> On such case, /dev/media0 disappears from the file system, so another
>>>>>>>>> open is not possible anymore. The data structures will remain
>>>>>>>>> allocated until all associated file descriptors are not closed.
>>>>>>>>>
>>>>>>>>> When all file descriptors are closed, the data will be freed.
>>>>>>>>>
>>>>>>>>> On that time, it will call an optional dev.release() callback,
>>>>>>>>> responsible to free any other data struct that the driver allocated.
>>>>>>>>
>>>>>>>> The patchset does not change this. It's not a question of the
>>>>>>>> media_devnode struct either. That's not an issue.
>>>>>>>>
>>>>>>>> The issue is rather what else can be accessed through the media device
>>>>>>>> and other interfaces. As IOCTLs are not serialised with device removal
>>>>>>>> (which now releases much of the data structures)
>>>>>>>
>>>>>>> Huh? ioctls are serialized with struct device removal. The Driver core
>>>>>>> warrants that.
>>>>>>
>>>>>> How?
>>>>>>
>>>>>> As far as I can tell, there's nothing in the way of an IOCTL being in
>>>>>> progress on a character device which is registered by the driver for a
>>>>>> hardware device which is being removed.
>>>>>>
>>>>>> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
>>>>>> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
>>>>>> are taken during that path, which I believe is by design.
>>>>
>>>> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
>>>> on release(). Thus ensuring that the cdev can never be removed while in an
>>>> ioctl.
>>>>
>>>>>>
>>>>>>>> there's a high chance of accessing
>>>>>>>> released memory (or mutexes that have been already destroyed). An
>>>>>>>> example of that is here, stopping a running pipeline after unbinding
>>>>>>>> the device. What happens there is that the media device is released
>>>>>>>> whilst it's in use through the video device.
>>>>>>>>
>>>>>>>> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
>>>>>>>
>>>>>>> It is not clear from the logs what the driver tried to do, but
>>>>>>> that sounds like a driver's bug, with was not prepared to properly
>>>>>>> handle unbinds.
>>>>>>>
>>>>>>> The problem here is that isp_video_release() is called by V4L2
>>>>>>> release logic, and not by the MC one:
>>>>>>>
>>>>>>> static const struct v4l2_file_operations isp_video_fops = {
>>>>>>>       .owner          = THIS_MODULE,
>>>>>>>       .open           = isp_video_open,
>>>>>>>       .release        = isp_video_release,
>>>>>>>       .poll           = vb2_fop_poll,
>>>>>>>       .unlocked_ioctl = video_ioctl2,
>>>>>>>       .mmap           = vb2_fop_mmap,
>>>>>>> };
>>>>>>>
>>>>>>> It seems that the driver's logic allows it to be called before or
>>>>>>> after destroying the MC.
>>>>>>>
>>>>>>> Assuming that, if the OMAP3 driver is not used it works,
>>>>>>> it means that, if the isp_video_release() is called
>>>>>>> first, no errors will happen, but if MC is destroyed before
>>>>>>> V4L2 call to its .release() callback, as there's no logic at the
>>>>>>> driver that would detect it, isp_video_release() will be calling
>>>>>>> isp_video_streamoff(), with depends on the MC to work.
>>>>>>>
>>>>>>> On a first glance, I can see two ways of fixing it:
>>>>>>>
>>>>>>> 1) to increment devnode's device kobject refcount at OMAP3 .probe(),
>>>>>>> decrementing it only at isp_video_release(). That will ensure that
>>>>>>> MC will only be removed after V4L2 removal.
>>>>>
>>>>> As soon as you have to dig deep in a structure to find a reference counter and
>>>>> increment it, bypassing all the API layers, you can be entirely sure that the
>>>>> solution is wrong.
>>>>
>>>> Indeed.
>>>>
>>>>>
>>>>>>> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
>>>>>>> inside the MC .release() callback.
>>>>>>
>>>>>> This is a fair suggestion, indeed. Let me see what could be done there.
>>>>>> Albeit this is just *one* of the existing issues. It will not address all
>>>>>> problems fixed by the patchset.
>>>>>
>>>>> We need to stop the hardware at .remove() time. That should not be linked to a
>>>>> videodev, v4l2_device or media_device .release() callback. When the .remove()
>>>>> callback returns the driver is not allowed to touch the hardware anymore. In
>>>>> particular, power domains might clocks or power supplies, leading to invalid
>>>>> access faults if we try to access hardware registers.
>>>>
>>>> Correct.
>>>>
>>>>>
>>>>> USB devices get help from the USB core that cancels all USB operations in
>>>>> progress when they're disconnected. Platform devices don't have it as easy,
>>>>> and need to implement everything themselves. We thus need to stop the
>>>>> hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF ioctl at
>>>>> .remove() time.
>>>>
>>>> Please don't. This shouldn't be done automatically.
>>>>
>>>>> That could introduce other races between .remove() and the
>>>>> userspace API. A better solution is to make sure the objects that are needed
>>>>> at .release() time of the device node are all reference-counted and only
>>>>> released when the last reference goes away.
>>>>>
>>>>> There's plenty of way to try and work around the problem in drivers, some more
>>>>> racy than others, but if we require changes to all platform drivers to fix
>>>>> this we need to ensure that we get it right, not as half-baked hacks spread
>>>>> around the whole subsystem.
>>>>
>>>> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
>>>> device and I see no reason whatsoever to start modifying platform drivers just
>>>> because you can do an unbind. I know there are real hot-pluggable devices, and
>>>> getting this right for those is of course important.
>>>
>>> This was my first reaction when I saw this RFC series. None of the platform
>>> drivers are designed to be unbound. Making core changes based on such as
>>> driver would make the core very complex.
>>>
>>> We can't even reproduce the problem on other drivers.
>>>
>>>>
>>>> If the omap3 is used as a testbed, then that's fine by me, but even then I
>>>> probably wouldn't want the omap3 code that makes this possible in the kernel.
>>>> It's just additional code for no purpose.
>>>
>>> I agree with Hans. Why are we using the most complex case as a reference driver
>>> and basing that driver to make core changes which will force changes to all the
>>> driver that use mc-core?
>>>
>>>>
>>>>>>> That could be done by overwriting the dev.release() callback at
>>>>>>> omap3 driver, as I discussed on my past e-mails, and flagging the
>>>>>>> driver that it should not accept streamon anymore, as the hardware
>>>>>>> is being disconnecting.
>>>>>>
>>>>>> A mutex will be needed to serialise the this with starting streaming.
>>>>>>
>>>>>>> Btw, that explains a lot why Shuah can't reproduce the stuff you're
>>>>>>> complaining on her USB hardware.
>>>>>>>
>>>>>>> The USB subsystem has a a .disconnect() callback that notifies
>>>>>>> the drivers that a device was unbound (likely physically removed).
>>>>>>> The way USB media drivers handle it is by returning -ENODEV to any
>>>>>>> V4L2 call that would try to touch at the hardware after unbound.
>>>>>
>>>>
>>>> In my view the main problem is that the media core is bound to a struct
>>>> device set by the driver that creates the MC. But since the MC gives an
>>>> overview of lots of other (sub)devices the refcount of the media device
>>>> should be increased for any (sub)device that adds itself to the MC and
>>>> decreased for any (sub)device that is removed. Only when the very last
>>>> user goes away can the MC memory be released.
>>>
>>> Correct. Media Device Allocator API work I did allows creating media device
>>> on parent USB device to allow media sound driver share the media device. It
>>> does ref-counting on media device and media device is unregistered only when
>>> the last driver unregisters it.
>>>
>>> There is another aspect to explore regarding media device and the graph.
>>>
>>> Should all the entities stick around until all references to media
>>> device are gone? If an application has /dev/media open, does that
>>> mean all entities should not be free'd until this app. exits? What
>>> should happen if an app. is streaming? Should the graph stay intact
>>> until the app. exits?
>>
>> Yes, everything must stay around until the last user has disappeared.
>>
>> In general unplugs can happen at any time. So applications can be in the middle
>> of an ioctl, and removing memory during that time is just impossible.
>>
>> On unplug you:
>>
>> 1) stop any HW DMA (highly device dependent)
>> 2) wake up any filehandles that wait for an event
>> 3) unregister any device nodes
>>
>> Then just sit back and wait for refcounts to go down as filehandles are closed
>> by the application.
>>
>> Note: the v4l2/media/cec/IR/whatever core is typically responsible for rejecting
>> any ioctls/mmap/etc. once the device node has been unregistered. The only valid
>> file operation is release().
>>
>>>
>>>    If yes, this would pose problems when we have multiple drivers bound
>>>    to the media device. When audio driver goes away for example, it should
>>>    be allowed to delete its entities.
>>
>> Only if you can safely remove it from the topology data structures while
>> being 100% certain that nobody can ever access it. I'm not sure if that is
>> the case. Actually, looking at e.g. adv7604.c it does media_entity_cleanup(&sd->entity);
>> in remove() which is an empty function, so there doesn't appear any attempt
>> to safely clean up an entity (i.e. make sure no running media ioctl can
>> access it or call ops).
>
> Right. media_entity_cleanup() nothing at the moment. Also if it gets called
> after media_device_unregister_entity(), it could pose problems. I wonder if
> we have drivers that are calling media_entity_cleanup() after unregistering
> the entity?
>
>>
>> This probably will need to be serialized with the graph_mutex lock.
>>
>>>
>>> The approach current mc-core takes is that the media_device and media_devnode
>>> stick around, but entities can be added and removed during media_device
>>> lifetime.
>>
>> Seems reasonable. But the removal needs to be done carefully, and that doesn't
>> seem to be the case now (unless adv7604.c is just buggy).
>
> Correct. It is possible media_device is embedded in this driver. When driver
> that embeds is unbound, media_device goes away. I needed to make the media
> device refcounted and sharable for audio work and that is what the Media Device
> Allocator API does.

Basically all you need to do is to refcount the struct device in the media_device:
call get_device(mdev->dev) when you take a reference, and put_device(mdev->dev)
when you no longer need it. The mdev itself is freed when the mdev->dev refcount
goes to 0.

No need to add another kref.


>
> Maybe we have more cases than this audio case that requires media_device refcounted.
> If we have to keep entities that are in use until all the references go away, we
> have to ref-count them as well.
>
>>
>>>
>>> If an app. is still running when media_device is unregistered, media_device
>>> isn't released until the last reference goes away and ioctls can check if
>>> media_device is registered or not.
>>>
>>> We have to decide on the larger lifetime question surrounding media_device
>>> and graph as well.
>>
>> I don't think there is any choice but to keep it all alive until the last
>> reference goes away.
>
> If you mean "all alive" entities as well, we have to ref-count them. Because
> drivers can unregister entities during run-time now. I am looking at the
> use-case where, a driver that has dvb and video and what should happen when
> dvb is unbound for example. Should dvb entities go away or should they stay
> until all the drivers are unbound?

That depends on the architecture. If these are completely independent devices
then you want to allow this, if possible to do this safely. But for e.g.
subdevices that depend on a parent device the unbind should just be prohibited.
There is no point whatsoever in allowing that.

>
> v4l2-core registers and unregisters entities and so does dvb-core. So when a
> driver unregisters video and dvb, these entities get deleted. So we have a
> distributed mode of registering and unregistering entities. We also have
> ioctls (video, dvb, and media) accessing these entities. So where do we make
> changes to ensure entities stick around until all users exit?
>
> Ref-counting entities won't work if they are embedded - like in the case of
> struct video_device which embeds the media entity. When struct video goes
> away then entity will disappear. So we do have a complex lifetime model here
> that we need to figure out how to fix.

That why I think the best approach would be to safely delete them from the
MC graph: take the top-level lock (graph_mutex I think) and remove all references
before releasing the lock.

I think this will work for interface entities, but for subdev entities this
certainly won't work. Unbinding subdevs should be blocked (just set
suppress_bind_attrs to true in all subdev drivers). Most top-level drivers
have pointers to subdev data, so unbinding them will just fail horribly.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 15:26                               ` Hans Verkuil
  2016-12-15 16:06                                 ` Shuah Khan
@ 2016-12-15 17:08                                 ` Mauro Carvalho Chehab
  2016-12-23 17:55                                   ` Laurent Pinchart
  2016-12-23 17:48                                 ` Laurent Pinchart
  2 siblings, 1 reply; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-15 17:08 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Shuah Khan, Laurent Pinchart, Sakari Ailus, Sakari Ailus, linux-media

Em Thu, 15 Dec 2016 16:26:19 +0100
Hans Verkuil <hverkuil@xs4all.nl> escreveu:

> > Should all the entities stick around until all references to media
> > device are gone? If an application has /dev/media open, does that
> > mean all entities should not be free'd until this app. exits? What
> > should happen if an app. is streaming? Should the graph stay intact
> > until the app. exits?  
> 
> Yes, everything must stay around until the last user has disappeared.
> 
> In general unplugs can happen at any time. So applications can be in the middle
> of an ioctl, and removing memory during that time is just impossible.
> 
> On unplug you:
> 
> 1) stop any HW DMA (highly device dependent)
> 2) wake up any filehandles that wait for an event
> 3) unregister any device nodes
> 
> Then just sit back and wait for refcounts to go down as filehandles are closed
> by the application.
> 
> Note: the v4l2/media/cec/IR/whatever core is typically responsible for rejecting
> any ioctls/mmap/etc. once the device node has been unregistered. The only valid
> file operation is release().

Agreed. The problem on OMAP3 is that it doesn't stop HW DMA when
struct media_devnode is released. It tries to do it later, when the
V4L2 core is unbind, by trying to dig into the media controller
struct that the driver removed before.

That's said, for OMAP3 and all other drivers that don't support hot unplug,
I would just use suppress_bind_attrs, as I fail to see any need to allow
unbinding them.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 16:28                                   ` Hans Verkuil
@ 2016-12-15 17:09                                     ` Shuah Khan
  2016-12-15 17:25                                       ` Mauro Carvalho Chehab
  2016-12-16 10:03                                       ` Hans Verkuil
  0 siblings, 2 replies; 89+ messages in thread
From: Shuah Khan @ 2016-12-15 17:09 UTC (permalink / raw)
  To: Hans Verkuil, Laurent Pinchart, Sakari Ailus
  Cc: Mauro Carvalho Chehab, Sakari Ailus, linux-media, Shuah Khan

On 12/15/2016 09:28 AM, Hans Verkuil wrote:
> On 15/12/16 17:06, Shuah Khan wrote:
>> On 12/15/2016 08:26 AM, Hans Verkuil wrote:
>>> On 15/12/16 15:45, Shuah Khan wrote:
>>>> On 12/15/2016 07:03 AM, Hans Verkuil wrote:
>>>>> On 15/12/16 13:56, Laurent Pinchart wrote:
>>>>>> Hi Sakari,
>>>>>>
>>>>>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>>>>>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>>>>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>>>>> Hi Sakari,
>>>>>>>>>>
>>>>>>>>>> I answered you point to point below, but I suspect that you missed how
>>>>>>>>>> the current approach works. So, I decided to write a quick summary
>>>>>>>>>> here.
>>>>>>>>>>
>>>>>>>>>> The character devices /dev/media? are created via cdev, with relies on
>>>>>>>>>> a kobject per device, with has an embedded struct kref inside.
>>>>>>>>>>
>>>>>>>>>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>>>>>>>>>> struct device, when the code does:
>>>>>>>>>>   devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>>>>>>>>>
>>>>>>>>>> before calling cdev_add().
>>>>>>>>>>
>>>>>>>>>> The current lifetime management is actually based on cdev's kobject's
>>>>>>>>>> refcount, provided by its embedded kref.
>>>>>>>>>>
>>>>>>>>>> The kref warrants that any data associated with /dev/media0 won't be
>>>>>>>>>> freed if there are any pending system call. In other words, when
>>>>>>>>>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
>>>>>>>>>> and will call kobject_put().
>>>>>>>>>>
>>>>>>>>>> If the refcount is zero, it will call devnode->dev.release(). If the
>>>>>>>>>> kobject refcount is not zero, the data won't be freed.
>>>>>>>>>>
>>>>>>>>>> So, in the best case scenario, there's no opened file descriptors
>>>>>>>>>> by the time media device node is unregistered. So, it will free
>>>>>>>>>> everything.
>>>>>>>>>>
>>>>>>>>>> In the worse case scenario, e. g. when the driver is removed or
>>>>>>>>>> unbind while /dev/media0 has some opened file descriptor(s),
>>>>>>>>>> the cdev logic will do the proper lifetime management.
>>>>>>>>>>
>>>>>>>>>> On such case, /dev/media0 disappears from the file system, so another
>>>>>>>>>> open is not possible anymore. The data structures will remain
>>>>>>>>>> allocated until all associated file descriptors are not closed.
>>>>>>>>>>
>>>>>>>>>> When all file descriptors are closed, the data will be freed.
>>>>>>>>>>
>>>>>>>>>> On that time, it will call an optional dev.release() callback,
>>>>>>>>>> responsible to free any other data struct that the driver allocated.
>>>>>>>>>
>>>>>>>>> The patchset does not change this. It's not a question of the
>>>>>>>>> media_devnode struct either. That's not an issue.
>>>>>>>>>
>>>>>>>>> The issue is rather what else can be accessed through the media device
>>>>>>>>> and other interfaces. As IOCTLs are not serialised with device removal
>>>>>>>>> (which now releases much of the data structures)
>>>>>>>>
>>>>>>>> Huh? ioctls are serialized with struct device removal. The Driver core
>>>>>>>> warrants that.
>>>>>>>
>>>>>>> How?
>>>>>>>
>>>>>>> As far as I can tell, there's nothing in the way of an IOCTL being in
>>>>>>> progress on a character device which is registered by the driver for a
>>>>>>> hardware device which is being removed.
>>>>>>>
>>>>>>> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
>>>>>>> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
>>>>>>> are taken during that path, which I believe is by design.
>>>>>
>>>>> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
>>>>> on release(). Thus ensuring that the cdev can never be removed while in an
>>>>> ioctl.
>>>>>
>>>>>>>
>>>>>>>>> there's a high chance of accessing
>>>>>>>>> released memory (or mutexes that have been already destroyed). An
>>>>>>>>> example of that is here, stopping a running pipeline after unbinding
>>>>>>>>> the device. What happens there is that the media device is released
>>>>>>>>> whilst it's in use through the video device.
>>>>>>>>>
>>>>>>>>> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
>>>>>>>>
>>>>>>>> It is not clear from the logs what the driver tried to do, but
>>>>>>>> that sounds like a driver's bug, with was not prepared to properly
>>>>>>>> handle unbinds.
>>>>>>>>
>>>>>>>> The problem here is that isp_video_release() is called by V4L2
>>>>>>>> release logic, and not by the MC one:
>>>>>>>>
>>>>>>>> static const struct v4l2_file_operations isp_video_fops = {
>>>>>>>>       .owner          = THIS_MODULE,
>>>>>>>>       .open           = isp_video_open,
>>>>>>>>       .release        = isp_video_release,
>>>>>>>>       .poll           = vb2_fop_poll,
>>>>>>>>       .unlocked_ioctl = video_ioctl2,
>>>>>>>>       .mmap           = vb2_fop_mmap,
>>>>>>>> };
>>>>>>>>
>>>>>>>> It seems that the driver's logic allows it to be called before or
>>>>>>>> after destroying the MC.
>>>>>>>>
>>>>>>>> Assuming that, if the OMAP3 driver is not used it works,
>>>>>>>> it means that, if the isp_video_release() is called
>>>>>>>> first, no errors will happen, but if MC is destroyed before
>>>>>>>> V4L2 call to its .release() callback, as there's no logic at the
>>>>>>>> driver that would detect it, isp_video_release() will be calling
>>>>>>>> isp_video_streamoff(), with depends on the MC to work.
>>>>>>>>
>>>>>>>> On a first glance, I can see two ways of fixing it:
>>>>>>>>
>>>>>>>> 1) to increment devnode's device kobject refcount at OMAP3 .probe(),
>>>>>>>> decrementing it only at isp_video_release(). That will ensure that
>>>>>>>> MC will only be removed after V4L2 removal.
>>>>>>
>>>>>> As soon as you have to dig deep in a structure to find a reference counter and
>>>>>> increment it, bypassing all the API layers, you can be entirely sure that the
>>>>>> solution is wrong.
>>>>>
>>>>> Indeed.
>>>>>
>>>>>>
>>>>>>>> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
>>>>>>>> inside the MC .release() callback.
>>>>>>>
>>>>>>> This is a fair suggestion, indeed. Let me see what could be done there.
>>>>>>> Albeit this is just *one* of the existing issues. It will not address all
>>>>>>> problems fixed by the patchset.
>>>>>>
>>>>>> We need to stop the hardware at .remove() time. That should not be linked to a
>>>>>> videodev, v4l2_device or media_device .release() callback. When the .remove()
>>>>>> callback returns the driver is not allowed to touch the hardware anymore. In
>>>>>> particular, power domains might clocks or power supplies, leading to invalid
>>>>>> access faults if we try to access hardware registers.
>>>>>
>>>>> Correct.
>>>>>
>>>>>>
>>>>>> USB devices get help from the USB core that cancels all USB operations in
>>>>>> progress when they're disconnected. Platform devices don't have it as easy,
>>>>>> and need to implement everything themselves. We thus need to stop the
>>>>>> hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF ioctl at
>>>>>> .remove() time.
>>>>>
>>>>> Please don't. This shouldn't be done automatically.
>>>>>
>>>>>> That could introduce other races between .remove() and the
>>>>>> userspace API. A better solution is to make sure the objects that are needed
>>>>>> at .release() time of the device node are all reference-counted and only
>>>>>> released when the last reference goes away.
>>>>>>
>>>>>> There's plenty of way to try and work around the problem in drivers, some more
>>>>>> racy than others, but if we require changes to all platform drivers to fix
>>>>>> this we need to ensure that we get it right, not as half-baked hacks spread
>>>>>> around the whole subsystem.
>>>>>
>>>>> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
>>>>> device and I see no reason whatsoever to start modifying platform drivers just
>>>>> because you can do an unbind. I know there are real hot-pluggable devices, and
>>>>> getting this right for those is of course important.
>>>>
>>>> This was my first reaction when I saw this RFC series. None of the platform
>>>> drivers are designed to be unbound. Making core changes based on such as
>>>> driver would make the core very complex.
>>>>
>>>> We can't even reproduce the problem on other drivers.
>>>>
>>>>>
>>>>> If the omap3 is used as a testbed, then that's fine by me, but even then I
>>>>> probably wouldn't want the omap3 code that makes this possible in the kernel.
>>>>> It's just additional code for no purpose.
>>>>
>>>> I agree with Hans. Why are we using the most complex case as a reference driver
>>>> and basing that driver to make core changes which will force changes to all the
>>>> driver that use mc-core?
>>>>
>>>>>
>>>>>>>> That could be done by overwriting the dev.release() callback at
>>>>>>>> omap3 driver, as I discussed on my past e-mails, and flagging the
>>>>>>>> driver that it should not accept streamon anymore, as the hardware
>>>>>>>> is being disconnecting.
>>>>>>>
>>>>>>> A mutex will be needed to serialise the this with starting streaming.
>>>>>>>
>>>>>>>> Btw, that explains a lot why Shuah can't reproduce the stuff you're
>>>>>>>> complaining on her USB hardware.
>>>>>>>>
>>>>>>>> The USB subsystem has a a .disconnect() callback that notifies
>>>>>>>> the drivers that a device was unbound (likely physically removed).
>>>>>>>> The way USB media drivers handle it is by returning -ENODEV to any
>>>>>>>> V4L2 call that would try to touch at the hardware after unbound.
>>>>>>
>>>>>
>>>>> In my view the main problem is that the media core is bound to a struct
>>>>> device set by the driver that creates the MC. But since the MC gives an
>>>>> overview of lots of other (sub)devices the refcount of the media device
>>>>> should be increased for any (sub)device that adds itself to the MC and
>>>>> decreased for any (sub)device that is removed. Only when the very last
>>>>> user goes away can the MC memory be released.
>>>>
>>>> Correct. Media Device Allocator API work I did allows creating media device
>>>> on parent USB device to allow media sound driver share the media device. It
>>>> does ref-counting on media device and media device is unregistered only when
>>>> the last driver unregisters it.
>>>>
>>>> There is another aspect to explore regarding media device and the graph.
>>>>
>>>> Should all the entities stick around until all references to media
>>>> device are gone? If an application has /dev/media open, does that
>>>> mean all entities should not be free'd until this app. exits? What
>>>> should happen if an app. is streaming? Should the graph stay intact
>>>> until the app. exits?
>>>
>>> Yes, everything must stay around until the last user has disappeared.
>>>
>>> In general unplugs can happen at any time. So applications can be in the middle
>>> of an ioctl, and removing memory during that time is just impossible.
>>>
>>> On unplug you:
>>>
>>> 1) stop any HW DMA (highly device dependent)
>>> 2) wake up any filehandles that wait for an event
>>> 3) unregister any device nodes
>>>
>>> Then just sit back and wait for refcounts to go down as filehandles are closed
>>> by the application.
>>>
>>> Note: the v4l2/media/cec/IR/whatever core is typically responsible for rejecting
>>> any ioctls/mmap/etc. once the device node has been unregistered. The only valid
>>> file operation is release().
>>>
>>>>
>>>>    If yes, this would pose problems when we have multiple drivers bound
>>>>    to the media device. When audio driver goes away for example, it should
>>>>    be allowed to delete its entities.
>>>
>>> Only if you can safely remove it from the topology data structures while
>>> being 100% certain that nobody can ever access it. I'm not sure if that is
>>> the case. Actually, looking at e.g. adv7604.c it does media_entity_cleanup(&sd->entity);
>>> in remove() which is an empty function, so there doesn't appear any attempt
>>> to safely clean up an entity (i.e. make sure no running media ioctl can
>>> access it or call ops).
>>
>> Right. media_entity_cleanup() nothing at the moment. Also if it gets called
>> after media_device_unregister_entity(), it could pose problems. I wonder if
>> we have drivers that are calling media_entity_cleanup() after unregistering
>> the entity?
>>
>>>
>>> This probably will need to be serialized with the graph_mutex lock.
>>>
>>>>
>>>> The approach current mc-core takes is that the media_device and media_devnode
>>>> stick around, but entities can be added and removed during media_device
>>>> lifetime.
>>>
>>> Seems reasonable. But the removal needs to be done carefully, and that doesn't
>>> seem to be the case now (unless adv7604.c is just buggy).
>>
>> Correct. It is possible media_device is embedded in this driver. When driver
>> that embeds is unbound, media_device goes away. I needed to make the media
>> device refcounted and sharable for audio work and that is what the Media Device
>> Allocator API does.
> 
> Basically all you need to do is to refcount the struct device in the media_device:
> call get_device(mdev->dev) when you take a reference, and put_device(mdev->dev)
> when you no longer need it. The mdev itself is freed when the mdev->dev refcount
> goes to 0.
> 
> No need to add another kref.

Right. I do have an additional kref in Media Device Allocator API that serves
a different purpose. It is used for ref-counting drivers that are sharing the
media_device so it can't be unregistered until all those drivers unregister it.
I think we discussed this when this API was reviewed.

> 
> 
>>
>> Maybe we have more cases than this audio case that requires media_device refcounted.
>> If we have to keep entities that are in use until all the references go away, we
>> have to ref-count them as well.
>>
>>>
>>>>
>>>> If an app. is still running when media_device is unregistered, media_device
>>>> isn't released until the last reference goes away and ioctls can check if
>>>> media_device is registered or not.
>>>>
>>>> We have to decide on the larger lifetime question surrounding media_device
>>>> and graph as well.
>>>
>>> I don't think there is any choice but to keep it all alive until the last
>>> reference goes away.
>>
>> If you mean "all alive" entities as well, we have to ref-count them. Because
>> drivers can unregister entities during run-time now. I am looking at the
>> use-case where, a driver that has dvb and video and what should happen when
>> dvb is unbound for example. Should dvb entities go away or should they stay
>> until all the drivers are unbound?
> 
> That depends on the architecture. If these are completely independent devices
> then you want to allow this, if possible to do this safely. But for e.g.
> subdevices that depend on a parent device the unbind should just be prohibited.
> There is no point whatsoever in allowing that.
> 
>>
>> v4l2-core registers and unregisters entities and so does dvb-core. So when a
>> driver unregisters video and dvb, these entities get deleted. So we have a
>> distributed mode of registering and unregistering entities. We also have
>> ioctls (video, dvb, and media) accessing these entities. So where do we make
>> changes to ensure entities stick around until all users exit?
>>
>> Ref-counting entities won't work if they are embedded - like in the case of
>> struct video_device which embeds the media entity. When struct video goes
>> away then entity will disappear. So we do have a complex lifetime model here
>> that we need to figure out how to fix.
> 
> That why I think the best approach would be to safely delete them from the
> MC graph: take the top-level lock (graph_mutex I think) and remove all references
> before releasing the lock.

media_device_unregister_entity() entity does that now. It also removes links.
Do you believe something is missing there. It does hold graph_mutex.

> 
> I think this will work for interface entities, but for subdev entities this
> certainly won't work. Unbinding subdevs should be blocked (just set
> suppress_bind_attrs to true in all subdev drivers). Most top-level drivers
> have pointers to subdev data, so unbinding them will just fail horribly.
> 

Yes that is an option. I did something similar for au0828 and snd_usb_audio
case, so the module that registers the media_device can't unbound until the
other driver. If au0828 registers media_device, it becomes the owner and if
it gets unbound ioctls will start to see problems.

What this means though is that drivers can't be unbound easily. But that is
a small price to pay compared to the problems we will see if a driver is
unbound when its entities are still in use. Also, unsetting bind_attrs has
to be done as well, otherwise we can never unbind any driver.

thanks,
-- Shuah




^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 17:09                                     ` Shuah Khan
@ 2016-12-15 17:25                                       ` Mauro Carvalho Chehab
  2016-12-15 17:51                                         ` Shuah Khan
  2016-12-16 10:03                                       ` Hans Verkuil
  1 sibling, 1 reply; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-15 17:25 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Hans Verkuil, Laurent Pinchart, Sakari Ailus, Sakari Ailus, linux-media

Em Thu, 15 Dec 2016 10:09:53 -0700
Shuah Khan <shuahkh@osg.samsung.com> escreveu:

> On 12/15/2016 09:28 AM, Hans Verkuil wrote:
> > On 15/12/16 17:06, Shuah Khan wrote:  

> > 
> > I think this will work for interface entities, but for subdev entities this
> > certainly won't work. Unbinding subdevs should be blocked (just set
> > suppress_bind_attrs to true in all subdev drivers). Most top-level drivers
> > have pointers to subdev data, so unbinding them will just fail horribly.
> >   
> 
> Yes that is an option. I did something similar for au0828 and snd_usb_audio
> case, so the module that registers the media_device can't unbound until the
> other driver. If au0828 registers media_device, it becomes the owner and if
> it gets unbound ioctls will start to see problems.
> 
> What this means though is that drivers can't be unbound easily. But that is
> a small price to pay compared to the problems we will see if a driver is
> unbound when its entities are still in use. Also, unsetting bind_attrs has
> to be done as well, otherwise we can never unbind any driver.

I don't think suppress_bind_attrs will work on USB drivers, as the
device can be physically removed. 

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 17:25                                       ` Mauro Carvalho Chehab
@ 2016-12-15 17:51                                         ` Shuah Khan
  2016-12-16 10:11                                           ` Hans Verkuil
  0 siblings, 1 reply; 89+ messages in thread
From: Shuah Khan @ 2016-12-15 17:51 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Hans Verkuil, Laurent Pinchart, Sakari Ailus, Sakari Ailus,
	linux-media, Shuah Khan

On 12/15/2016 10:25 AM, Mauro Carvalho Chehab wrote:
> Em Thu, 15 Dec 2016 10:09:53 -0700
> Shuah Khan <shuahkh@osg.samsung.com> escreveu:
> 
>> On 12/15/2016 09:28 AM, Hans Verkuil wrote:
>>> On 15/12/16 17:06, Shuah Khan wrote:  
> 
>>>
>>> I think this will work for interface entities, but for subdev entities this
>>> certainly won't work. Unbinding subdevs should be blocked (just set
>>> suppress_bind_attrs to true in all subdev drivers). Most top-level drivers
>>> have pointers to subdev data, so unbinding them will just fail horribly.
>>>   
>>
>> Yes that is an option. I did something similar for au0828 and snd_usb_audio
>> case, so the module that registers the media_device can't unbound until the
>> other driver. If au0828 registers media_device, it becomes the owner and if
>> it gets unbound ioctls will start to see problems.

Sorry I meant to say rmmod'ed not unbound. Unbound will work just fine. If the
modules that owns the media_devnode goes away, there will be problems with
cdev trying to load module when application closes the device file and exits.
In this case, Media Device Allocator API takes module reference, so its use
count goes up.

>>
>> What this means though is that drivers can't be unbound easily. But that is
>> a small price to pay compared to the problems we will see if a driver is
>> unbound when its entities are still in use. Also, unsetting bind_attrs has
>> to be done as well, otherwise we can never unbind any driver.
> 
> I don't think suppress_bind_attrs will work on USB drivers, as the
> device can be physically removed. 
> 

Yeah setting suppress_bind_attrs would cause problems. On one hand keeping
all entities until all references are gone sound like a good option, however
this would cause problems coordinating removal especially in the case of
embedded entities. Can this be done in a simpler way? The way I see it, we
have /dev/video, /dev/dvb, /dev/snd/* etc. that depend on /dev/media for
graph nodes. Any one of these devices could be open when any of the drivers
is unbound (physical removal is a simpler case).

Would it make sense to enforce that dependency. Can we tie /dev/media usecount
to /dev/video etc. usecount? In other words:

/dev/video is opened, then open /dev/media.
prevent entities being removed if /dev/media is open.

Would that help. The above could be done in a generic way possibly. Would it
help if /dev/media is kept open when streaming is active? That is just one
use-case, there might be others.

thanks,
-- Shuah


thanks,
-- Shuah


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management
  2016-12-15 11:57           ` Laurent Pinchart
@ 2016-12-15 19:17             ` Shuah Khan
  0 siblings, 0 replies; 89+ messages in thread
From: Shuah Khan @ 2016-12-15 19:17 UTC (permalink / raw)
  To: Laurent Pinchart, Sakari Ailus
  Cc: Sakari Ailus, linux-media, hverkuil, mchehab, Shuah Khan

Hi Skarai,

On 12/15/2016 04:57 AM, Laurent Pinchart wrote:
> On Thursday 15 Dec 2016 13:45:25 Sakari Ailus wrote:
>> Hi Laurent,
>>
>> On 12/15/16 13:42, Laurent Pinchart wrote:
>>> You can split that part out. The devm_* removal is independent and could
>>> be moved to the beginning of the series.
>>
>> Where do you release the memory in that case? In driver's remove(), i.e.
>> this patch would simply move that code to isp_remove()?
> 
> Yes, the kfree() calls would be in isp_remove(). The patch will then be 
> faithful to its $SUBJECT, and moving to a release() handler should be done in 
> a separate patch.
> 

I have a patch that does that for you. I was playing with devm removal from
omap3. You are welcome to just use it. This also includes regulator puts in
proper places. I also included a patch that removes extra media_entity_cleanup()

I will send those in a bit

thanks,
-- Shuah




^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 17:09                                     ` Shuah Khan
  2016-12-15 17:25                                       ` Mauro Carvalho Chehab
@ 2016-12-16 10:03                                       ` Hans Verkuil
  2016-12-16 10:12                                         ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 89+ messages in thread
From: Hans Verkuil @ 2016-12-16 10:03 UTC (permalink / raw)
  To: Shuah Khan, Laurent Pinchart, Sakari Ailus
  Cc: Mauro Carvalho Chehab, Sakari Ailus, linux-media

On 15/12/16 18:09, Shuah Khan wrote:
> On 12/15/2016 09:28 AM, Hans Verkuil wrote:
>> On 15/12/16 17:06, Shuah Khan wrote:
>>> On 12/15/2016 08:26 AM, Hans Verkuil wrote:
>>>> On 15/12/16 15:45, Shuah Khan wrote:
>>>>> On 12/15/2016 07:03 AM, Hans Verkuil wrote:
>>>>>> On 15/12/16 13:56, Laurent Pinchart wrote:
>>>>>>> Hi Sakari,
>>>>>>>
>>>>>>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
>>>>>>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
>>>>>>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
>>>>>>>>>>> Hi Sakari,
>>>>>>>>>>>
>>>>>>>>>>> I answered you point to point below, but I suspect that you missed how
>>>>>>>>>>> the current approach works. So, I decided to write a quick summary
>>>>>>>>>>> here.
>>>>>>>>>>>
>>>>>>>>>>> The character devices /dev/media? are created via cdev, with relies on
>>>>>>>>>>> a kobject per device, with has an embedded struct kref inside.
>>>>>>>>>>>
>>>>>>>>>>> Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
>>>>>>>>>>> struct device, when the code does:
>>>>>>>>>>>   devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>>>>>>>>>>
>>>>>>>>>>> before calling cdev_add().
>>>>>>>>>>>
>>>>>>>>>>> The current lifetime management is actually based on cdev's kobject's
>>>>>>>>>>> refcount, provided by its embedded kref.
>>>>>>>>>>>
>>>>>>>>>>> The kref warrants that any data associated with /dev/media0 won't be
>>>>>>>>>>> freed if there are any pending system call. In other words, when
>>>>>>>>>>> cdev_del() is called, it will remove /dev/media0 from the filesystem,
>>>>>>>>>>> and will call kobject_put().
>>>>>>>>>>>
>>>>>>>>>>> If the refcount is zero, it will call devnode->dev.release(). If the
>>>>>>>>>>> kobject refcount is not zero, the data won't be freed.
>>>>>>>>>>>
>>>>>>>>>>> So, in the best case scenario, there's no opened file descriptors
>>>>>>>>>>> by the time media device node is unregistered. So, it will free
>>>>>>>>>>> everything.
>>>>>>>>>>>
>>>>>>>>>>> In the worse case scenario, e. g. when the driver is removed or
>>>>>>>>>>> unbind while /dev/media0 has some opened file descriptor(s),
>>>>>>>>>>> the cdev logic will do the proper lifetime management.
>>>>>>>>>>>
>>>>>>>>>>> On such case, /dev/media0 disappears from the file system, so another
>>>>>>>>>>> open is not possible anymore. The data structures will remain
>>>>>>>>>>> allocated until all associated file descriptors are not closed.
>>>>>>>>>>>
>>>>>>>>>>> When all file descriptors are closed, the data will be freed.
>>>>>>>>>>>
>>>>>>>>>>> On that time, it will call an optional dev.release() callback,
>>>>>>>>>>> responsible to free any other data struct that the driver allocated.
>>>>>>>>>>
>>>>>>>>>> The patchset does not change this. It's not a question of the
>>>>>>>>>> media_devnode struct either. That's not an issue.
>>>>>>>>>>
>>>>>>>>>> The issue is rather what else can be accessed through the media device
>>>>>>>>>> and other interfaces. As IOCTLs are not serialised with device removal
>>>>>>>>>> (which now releases much of the data structures)
>>>>>>>>>
>>>>>>>>> Huh? ioctls are serialized with struct device removal. The Driver core
>>>>>>>>> warrants that.
>>>>>>>>
>>>>>>>> How?
>>>>>>>>
>>>>>>>> As far as I can tell, there's nothing in the way of an IOCTL being in
>>>>>>>> progress on a character device which is registered by the driver for a
>>>>>>>> hardware device which is being removed.
>>>>>>>>
>>>>>>>> vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
>>>>>>>> case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
>>>>>>>> are taken during that path, which I believe is by design.
>>>>>>
>>>>>> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
>>>>>> on release(). Thus ensuring that the cdev can never be removed while in an
>>>>>> ioctl.
>>>>>>
>>>>>>>>
>>>>>>>>>> there's a high chance of accessing
>>>>>>>>>> released memory (or mutexes that have been already destroyed). An
>>>>>>>>>> example of that is here, stopping a running pipeline after unbinding
>>>>>>>>>> the device. What happens there is that the media device is released
>>>>>>>>>> whilst it's in use through the video device.
>>>>>>>>>>
>>>>>>>>>> <URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
>>>>>>>>>
>>>>>>>>> It is not clear from the logs what the driver tried to do, but
>>>>>>>>> that sounds like a driver's bug, with was not prepared to properly
>>>>>>>>> handle unbinds.
>>>>>>>>>
>>>>>>>>> The problem here is that isp_video_release() is called by V4L2
>>>>>>>>> release logic, and not by the MC one:
>>>>>>>>>
>>>>>>>>> static const struct v4l2_file_operations isp_video_fops = {
>>>>>>>>>       .owner          = THIS_MODULE,
>>>>>>>>>       .open           = isp_video_open,
>>>>>>>>>       .release        = isp_video_release,
>>>>>>>>>       .poll           = vb2_fop_poll,
>>>>>>>>>       .unlocked_ioctl = video_ioctl2,
>>>>>>>>>       .mmap           = vb2_fop_mmap,
>>>>>>>>> };
>>>>>>>>>
>>>>>>>>> It seems that the driver's logic allows it to be called before or
>>>>>>>>> after destroying the MC.
>>>>>>>>>
>>>>>>>>> Assuming that, if the OMAP3 driver is not used it works,
>>>>>>>>> it means that, if the isp_video_release() is called
>>>>>>>>> first, no errors will happen, but if MC is destroyed before
>>>>>>>>> V4L2 call to its .release() callback, as there's no logic at the
>>>>>>>>> driver that would detect it, isp_video_release() will be calling
>>>>>>>>> isp_video_streamoff(), with depends on the MC to work.
>>>>>>>>>
>>>>>>>>> On a first glance, I can see two ways of fixing it:
>>>>>>>>>
>>>>>>>>> 1) to increment devnode's device kobject refcount at OMAP3 .probe(),
>>>>>>>>> decrementing it only at isp_video_release(). That will ensure that
>>>>>>>>> MC will only be removed after V4L2 removal.
>>>>>>>
>>>>>>> As soon as you have to dig deep in a structure to find a reference counter and
>>>>>>> increment it, bypassing all the API layers, you can be entirely sure that the
>>>>>>> solution is wrong.
>>>>>>
>>>>>> Indeed.
>>>>>>
>>>>>>>
>>>>>>>>> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
>>>>>>>>> inside the MC .release() callback.
>>>>>>>>
>>>>>>>> This is a fair suggestion, indeed. Let me see what could be done there.
>>>>>>>> Albeit this is just *one* of the existing issues. It will not address all
>>>>>>>> problems fixed by the patchset.
>>>>>>>
>>>>>>> We need to stop the hardware at .remove() time. That should not be linked to a
>>>>>>> videodev, v4l2_device or media_device .release() callback. When the .remove()
>>>>>>> callback returns the driver is not allowed to touch the hardware anymore. In
>>>>>>> particular, power domains might clocks or power supplies, leading to invalid
>>>>>>> access faults if we try to access hardware registers.
>>>>>>
>>>>>> Correct.
>>>>>>
>>>>>>>
>>>>>>> USB devices get help from the USB core that cancels all USB operations in
>>>>>>> progress when they're disconnected. Platform devices don't have it as easy,
>>>>>>> and need to implement everything themselves. We thus need to stop the
>>>>>>> hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF ioctl at
>>>>>>> .remove() time.
>>>>>>
>>>>>> Please don't. This shouldn't be done automatically.
>>>>>>
>>>>>>> That could introduce other races between .remove() and the
>>>>>>> userspace API. A better solution is to make sure the objects that are needed
>>>>>>> at .release() time of the device node are all reference-counted and only
>>>>>>> released when the last reference goes away.
>>>>>>>
>>>>>>> There's plenty of way to try and work around the problem in drivers, some more
>>>>>>> racy than others, but if we require changes to all platform drivers to fix
>>>>>>> this we need to ensure that we get it right, not as half-baked hacks spread
>>>>>>> around the whole subsystem.
>>>>>>
>>>>>> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
>>>>>> device and I see no reason whatsoever to start modifying platform drivers just
>>>>>> because you can do an unbind. I know there are real hot-pluggable devices, and
>>>>>> getting this right for those is of course important.
>>>>>
>>>>> This was my first reaction when I saw this RFC series. None of the platform
>>>>> drivers are designed to be unbound. Making core changes based on such as
>>>>> driver would make the core very complex.
>>>>>
>>>>> We can't even reproduce the problem on other drivers.
>>>>>
>>>>>>
>>>>>> If the omap3 is used as a testbed, then that's fine by me, but even then I
>>>>>> probably wouldn't want the omap3 code that makes this possible in the kernel.
>>>>>> It's just additional code for no purpose.
>>>>>
>>>>> I agree with Hans. Why are we using the most complex case as a reference driver
>>>>> and basing that driver to make core changes which will force changes to all the
>>>>> driver that use mc-core?
>>>>>
>>>>>>
>>>>>>>>> That could be done by overwriting the dev.release() callback at
>>>>>>>>> omap3 driver, as I discussed on my past e-mails, and flagging the
>>>>>>>>> driver that it should not accept streamon anymore, as the hardware
>>>>>>>>> is being disconnecting.
>>>>>>>>
>>>>>>>> A mutex will be needed to serialise the this with starting streaming.
>>>>>>>>
>>>>>>>>> Btw, that explains a lot why Shuah can't reproduce the stuff you're
>>>>>>>>> complaining on her USB hardware.
>>>>>>>>>
>>>>>>>>> The USB subsystem has a a .disconnect() callback that notifies
>>>>>>>>> the drivers that a device was unbound (likely physically removed).
>>>>>>>>> The way USB media drivers handle it is by returning -ENODEV to any
>>>>>>>>> V4L2 call that would try to touch at the hardware after unbound.
>>>>>>>
>>>>>>
>>>>>> In my view the main problem is that the media core is bound to a struct
>>>>>> device set by the driver that creates the MC. But since the MC gives an
>>>>>> overview of lots of other (sub)devices the refcount of the media device
>>>>>> should be increased for any (sub)device that adds itself to the MC and
>>>>>> decreased for any (sub)device that is removed. Only when the very last
>>>>>> user goes away can the MC memory be released.
>>>>>
>>>>> Correct. Media Device Allocator API work I did allows creating media device
>>>>> on parent USB device to allow media sound driver share the media device. It
>>>>> does ref-counting on media device and media device is unregistered only when
>>>>> the last driver unregisters it.
>>>>>
>>>>> There is another aspect to explore regarding media device and the graph.
>>>>>
>>>>> Should all the entities stick around until all references to media
>>>>> device are gone? If an application has /dev/media open, does that
>>>>> mean all entities should not be free'd until this app. exits? What
>>>>> should happen if an app. is streaming? Should the graph stay intact
>>>>> until the app. exits?
>>>>
>>>> Yes, everything must stay around until the last user has disappeared.
>>>>
>>>> In general unplugs can happen at any time. So applications can be in the middle
>>>> of an ioctl, and removing memory during that time is just impossible.
>>>>
>>>> On unplug you:
>>>>
>>>> 1) stop any HW DMA (highly device dependent)
>>>> 2) wake up any filehandles that wait for an event
>>>> 3) unregister any device nodes
>>>>
>>>> Then just sit back and wait for refcounts to go down as filehandles are closed
>>>> by the application.
>>>>
>>>> Note: the v4l2/media/cec/IR/whatever core is typically responsible for rejecting
>>>> any ioctls/mmap/etc. once the device node has been unregistered. The only valid
>>>> file operation is release().
>>>>
>>>>>
>>>>>    If yes, this would pose problems when we have multiple drivers bound
>>>>>    to the media device. When audio driver goes away for example, it should
>>>>>    be allowed to delete its entities.
>>>>
>>>> Only if you can safely remove it from the topology data structures while
>>>> being 100% certain that nobody can ever access it. I'm not sure if that is
>>>> the case. Actually, looking at e.g. adv7604.c it does media_entity_cleanup(&sd->entity);
>>>> in remove() which is an empty function, so there doesn't appear any attempt
>>>> to safely clean up an entity (i.e. make sure no running media ioctl can
>>>> access it or call ops).
>>>
>>> Right. media_entity_cleanup() nothing at the moment. Also if it gets called
>>> after media_device_unregister_entity(), it could pose problems. I wonder if
>>> we have drivers that are calling media_entity_cleanup() after unregistering
>>> the entity?
>>>
>>>>
>>>> This probably will need to be serialized with the graph_mutex lock.
>>>>
>>>>>
>>>>> The approach current mc-core takes is that the media_device and media_devnode
>>>>> stick around, but entities can be added and removed during media_device
>>>>> lifetime.
>>>>
>>>> Seems reasonable. But the removal needs to be done carefully, and that doesn't
>>>> seem to be the case now (unless adv7604.c is just buggy).
>>>
>>> Correct. It is possible media_device is embedded in this driver. When driver
>>> that embeds is unbound, media_device goes away. I needed to make the media
>>> device refcounted and sharable for audio work and that is what the Media Device
>>> Allocator API does.
>>
>> Basically all you need to do is to refcount the struct device in the media_device:
>> call get_device(mdev->dev) when you take a reference, and put_device(mdev->dev)
>> when you no longer need it. The mdev itself is freed when the mdev->dev refcount
>> goes to 0.
>>
>> No need to add another kref.
>
> Right. I do have an additional kref in Media Device Allocator API that serves
> a different purpose. It is used for ref-counting drivers that are sharing the
> media_device so it can't be unregistered until all those drivers unregister it.
> I think we discussed this when this API was reviewed.

But that would be the same as using the kref of the device associated with the
media_device (mdev->dev), which presumably is the device that allocated the
media_device memory in the first place, right?

I may not have realized that when reviewing this originally.

Refcounting is hard enough without adding more krefs if they aren't necessary.

>
>>
>>
>>>
>>> Maybe we have more cases than this audio case that requires media_device refcounted.
>>> If we have to keep entities that are in use until all the references go away, we
>>> have to ref-count them as well.
>>>
>>>>
>>>>>
>>>>> If an app. is still running when media_device is unregistered, media_device
>>>>> isn't released until the last reference goes away and ioctls can check if
>>>>> media_device is registered or not.
>>>>>
>>>>> We have to decide on the larger lifetime question surrounding media_device
>>>>> and graph as well.
>>>>
>>>> I don't think there is any choice but to keep it all alive until the last
>>>> reference goes away.
>>>
>>> If you mean "all alive" entities as well, we have to ref-count them. Because
>>> drivers can unregister entities during run-time now. I am looking at the
>>> use-case where, a driver that has dvb and video and what should happen when
>>> dvb is unbound for example. Should dvb entities go away or should they stay
>>> until all the drivers are unbound?
>>
>> That depends on the architecture. If these are completely independent devices
>> then you want to allow this, if possible to do this safely. But for e.g.
>> subdevices that depend on a parent device the unbind should just be prohibited.
>> There is no point whatsoever in allowing that.
>>
>>>
>>> v4l2-core registers and unregisters entities and so does dvb-core. So when a
>>> driver unregisters video and dvb, these entities get deleted. So we have a
>>> distributed mode of registering and unregistering entities. We also have
>>> ioctls (video, dvb, and media) accessing these entities. So where do we make
>>> changes to ensure entities stick around until all users exit?
>>>
>>> Ref-counting entities won't work if they are embedded - like in the case of
>>> struct video_device which embeds the media entity. When struct video goes
>>> away then entity will disappear. So we do have a complex lifetime model here
>>> that we need to figure out how to fix.
>>
>> That why I think the best approach would be to safely delete them from the
>> MC graph: take the top-level lock (graph_mutex I think) and remove all references
>> before releasing the lock.
>
> media_device_unregister_entity() entity does that now. It also removes links.
> Do you believe something is missing there. It does hold graph_mutex.

Looks good.

>
>>
>> I think this will work for interface entities, but for subdev entities this
>> certainly won't work. Unbinding subdevs should be blocked (just set
>> suppress_bind_attrs to true in all subdev drivers). Most top-level drivers
>> have pointers to subdev data, so unbinding them will just fail horribly.
>>
>
> Yes that is an option. I did something similar for au0828 and snd_usb_audio
> case, so the module that registers the media_device can't unbound until the
> other driver. If au0828 registers media_device, it becomes the owner and if
> it gets unbound ioctls will start to see problems.
>
> What this means though is that drivers can't be unbound easily. But that is
> a small price to pay compared to the problems we will see if a driver is
> unbound when its entities are still in use. Also, unsetting bind_attrs has
> to be done as well, otherwise we can never unbind any driver.

So:

1) subdev drivers should disallow unbind
2) interface entities should call media_device_unregister_entity() when they
    are unregistered (if that doesn't already happen)

Regards,

	Hans

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 17:51                                         ` Shuah Khan
@ 2016-12-16 10:11                                           ` Hans Verkuil
  2016-12-16 10:57                                             ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 89+ messages in thread
From: Hans Verkuil @ 2016-12-16 10:11 UTC (permalink / raw)
  To: Shuah Khan, Mauro Carvalho Chehab
  Cc: Laurent Pinchart, Sakari Ailus, Sakari Ailus, linux-media

On 15/12/16 18:51, Shuah Khan wrote:
> On 12/15/2016 10:25 AM, Mauro Carvalho Chehab wrote:
>> Em Thu, 15 Dec 2016 10:09:53 -0700
>> Shuah Khan <shuahkh@osg.samsung.com> escreveu:
>>
>>> On 12/15/2016 09:28 AM, Hans Verkuil wrote:
>>>> On 15/12/16 17:06, Shuah Khan wrote:
>>
>>>>
>>>> I think this will work for interface entities, but for subdev entities this
>>>> certainly won't work. Unbinding subdevs should be blocked (just set
>>>> suppress_bind_attrs to true in all subdev drivers). Most top-level drivers
>>>> have pointers to subdev data, so unbinding them will just fail horribly.
>>>>
>>>
>>> Yes that is an option. I did something similar for au0828 and snd_usb_audio
>>> case, so the module that registers the media_device can't unbound until the
>>> other driver. If au0828 registers media_device, it becomes the owner and if
>>> it gets unbound ioctls will start to see problems.
>
> Sorry I meant to say rmmod'ed not unbound. Unbound will work just fine. If the
> modules that owns the media_devnode goes away, there will be problems with
> cdev trying to load module when application closes the device file and exits.
> In this case, Media Device Allocator API takes module reference, so its use
> count goes up.
>
>>>
>>> What this means though is that drivers can't be unbound easily. But that is
>>> a small price to pay compared to the problems we will see if a driver is
>>> unbound when its entities are still in use. Also, unsetting bind_attrs has
>>> to be done as well, otherwise we can never unbind any driver.
>>
>> I don't think suppress_bind_attrs will work on USB drivers, as the
>> device can be physically removed.
>>
>
> Yeah setting suppress_bind_attrs would cause problems. On one hand keeping
> all entities until all references are gone sound like a good option, however
> this would cause problems coordinating removal especially in the case of
> embedded entities. Can this be done in a simpler way? The way I see it, we
> have /dev/video, /dev/dvb, /dev/snd/* etc. that depend on /dev/media for
> graph nodes. Any one of these devices could be open when any of the drivers
> is unbound (physical removal is a simpler case).
>
> Would it make sense to enforce that dependency. Can we tie /dev/media usecount
> to /dev/video etc. usecount? In other words:
>
> /dev/video is opened, then open /dev/media.

When a device node is registered it should increase the refcount on the media_device
(as I proposed, that would be mdev->dev). When a device node is unregistered and the
last user disappeared, then it can decrease the media_device refcount.

So as long as anyone is using a device node, the media_device will stick around as
well.

No need to take refcounts on open/close.

One note: as I mentioned, the video_device does not set the cdev parent correctly,
so that bug needs to be fixed first for this to work.

> prevent entities being removed if /dev/media is open.
>
> Would that help. The above could be done in a generic way possibly. Would it
> help if /dev/media is kept open when streaming is active? That is just one

Again, it's not about the device nodes, it's about the media_device.

Regards,

	Hans

> use-case, there might be others.
>
> thanks,
> -- Shuah
>
>
> thanks,
> -- Shuah
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-16 10:03                                       ` Hans Verkuil
@ 2016-12-16 10:12                                         ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-16 10:12 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Shuah Khan, Laurent Pinchart, Sakari Ailus, Sakari Ailus, linux-media

Em Fri, 16 Dec 2016 11:03:09 +0100
Hans Verkuil <hverkuil@xs4all.nl> escreveu:

> So:
> 
> 1) subdev drivers should disallow unbind
> 2) interface entities should call media_device_unregister_entity() when they
>     are unregistered (if that doesn't already happen)

Sounds like a plan to me.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-16 10:11                                           ` Hans Verkuil
@ 2016-12-16 10:57                                             ` Mauro Carvalho Chehab
  2016-12-16 11:27                                               ` Hans Verkuil
  0 siblings, 1 reply; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-16 10:57 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Shuah Khan, Laurent Pinchart, Sakari Ailus, Sakari Ailus, linux-media

Em Fri, 16 Dec 2016 11:11:25 +0100
Hans Verkuil <hverkuil@xs4all.nl> escreveu:

> > Would it make sense to enforce that dependency. Can we tie /dev/media usecount
> > to /dev/video etc. usecount? In other words:
> >
> > /dev/video is opened, then open /dev/media.  
> 
> When a device node is registered it should increase the refcount on the media_device
> (as I proposed, that would be mdev->dev). When a device node is unregistered and the
> last user disappeared, then it can decrease the media_device refcount.
> 
> So as long as anyone is using a device node, the media_device will stick around as
> well.
> 
> No need to take refcounts on open/close.

That makes sense. You're meaning something like the enclosed (untested)
patch?

> One note: as I mentioned, the video_device does not set the cdev parent correctly,
> so that bug needs to be fixed first for this to work.

Actually, __video_register_device() seems to be setting the parent
properly:

	if (vdev->dev_parent == NULL)
		vdev->dev_parent = vdev->v4l2_dev->dev;

Thanks,
Mauro

[PATCH] Be sure that the media_device won't be freed too early

This code snippet is untested.

Signed-off-by: Mauro Carvalho chehab <mchehab@s-opensource.com>

diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
index 8756275e9fc4..5fdeab382069 100644
--- a/drivers/media/media-device.c
+++ b/drivers/media/media-device.c
@@ -706,7 +706,7 @@ int __must_check __media_device_register(struct media_device *mdev,
 	struct media_devnode *devnode;
 	int ret;
 
-	devnode = kzalloc(sizeof(*devnode), GFP_KERNEL);
+	devnode = devm_kzalloc(mdev->dev, sizeof(*devnode), GFP_KERNEL);
 	if (!devnode)
 		return -ENOMEM;
 
diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
index 8be561ab2615..14a3c56dbcac 100644
--- a/drivers/media/v4l2-core/v4l2-dev.c
+++ b/drivers/media/v4l2-core/v4l2-dev.c
@@ -196,6 +196,7 @@ static void v4l2_device_release(struct device *cd)
 #if defined(CONFIG_MEDIA_CONTROLLER)
 	if (v4l2_dev->mdev) {
 		/* Remove interfaces and interface links */
+		put_device(v4l2_dev->mdev->dev);
 		media_devnode_remove(vdev->intf_devnode);
 		if (vdev->entity.function != MEDIA_ENT_F_UNKNOWN)
 			media_device_unregister_entity(&vdev->entity);
@@ -810,6 +811,7 @@ static int video_register_media_controller(struct video_device *vdev, int type)
 			return -ENOMEM;
 		}
 	}
+	get_device(vdev->v4l2_dev->dev);
 
 	/* FIXME: how to create the other interface links? */
 
@@ -1015,6 +1017,11 @@ void video_unregister_device(struct video_device *vdev)
 	if (!vdev || !video_is_registered(vdev))
 		return;
 
+#if defined(CONFIG_MEDIA_CONTROLLER)
+	if (vdev->v4l2_dev->dev)
+		put_device(vdev->v4l2_dev->dev);
+#endif
+
 	mutex_lock(&videodev_lock);
 	/* This must be in a critical section to prevent a race with v4l2_open.
 	 * Once this bit has been cleared video_get may never be called again.
diff --git a/drivers/media/v4l2-core/v4l2-device.c b/drivers/media/v4l2-core/v4l2-device.c
index 62bbed76dbbc..53f42090c762 100644
--- a/drivers/media/v4l2-core/v4l2-device.c
+++ b/drivers/media/v4l2-core/v4l2-device.c
@@ -188,6 +188,7 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
 		err = media_device_register_entity(v4l2_dev->mdev, entity);
 		if (err < 0)
 			goto error_module;
+		get_device(v4l2_dev->mdev->dev);
 	}
 #endif
 
@@ -205,6 +206,8 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
 
 error_unregister:
 #if defined(CONFIG_MEDIA_CONTROLLER)
+	if (v4l2_dev->mdev)
+		put_device(v4l2_dev->mdev->dev);
 	media_device_unregister_entity(entity);
 #endif
 error_module:
@@ -310,6 +313,7 @@ void v4l2_device_unregister_subdev(struct v4l2_subdev *sd)
 		 * links are removed by the function below, in the right order
 		 */
 		media_device_unregister_entity(&sd->entity);
+		put_device(v4l2_dev->mdev->dev);
 	}
 #endif
 	video_unregister_device(sd->devnode);





^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-16 10:57                                             ` Mauro Carvalho Chehab
@ 2016-12-16 11:27                                               ` Hans Verkuil
  2016-12-16 12:00                                                 ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 89+ messages in thread
From: Hans Verkuil @ 2016-12-16 11:27 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shuah Khan, Laurent Pinchart, Sakari Ailus, Sakari Ailus, linux-media

On 16/12/16 11:57, Mauro Carvalho Chehab wrote:
> Em Fri, 16 Dec 2016 11:11:25 +0100
> Hans Verkuil <hverkuil@xs4all.nl> escreveu:
>
>>> Would it make sense to enforce that dependency. Can we tie /dev/media usecount
>>> to /dev/video etc. usecount? In other words:
>>>
>>> /dev/video is opened, then open /dev/media.
>>
>> When a device node is registered it should increase the refcount on the media_device
>> (as I proposed, that would be mdev->dev). When a device node is unregistered and the
>> last user disappeared, then it can decrease the media_device refcount.
>>
>> So as long as anyone is using a device node, the media_device will stick around as
>> well.
>>
>> No need to take refcounts on open/close.
>
> That makes sense. You're meaning something like the enclosed (untested)
> patch?
>
>> One note: as I mentioned, the video_device does not set the cdev parent correctly,
>> so that bug needs to be fixed first for this to work.
>
> Actually, __video_register_device() seems to be setting the parent
> properly:
>
> 	if (vdev->dev_parent == NULL)
> 		vdev->dev_parent = vdev->v4l2_dev->dev;

No, I mean this code (from cec-core.c):


        /* Part 2: Initialize and register the character device */
         cdev_init(&devnode->cdev, &cec_devnode_fops);
         devnode->cdev.kobj.parent = &devnode->dev.kobj;
         devnode->cdev.owner = owner;

         ret = cdev_add(&devnode->cdev, devnode->dev.devt, 1);
         if (ret < 0) {
                 pr_err("%s: cdev_add failed\n", __func__);
                 goto clr_bit;
         }

         ret = device_add(&devnode->dev);
         if (ret)
                 goto cdev_del;

which sets cdev.kobj.parent. And that would indeed be vdev->dev_parent.

>
> Thanks,
> Mauro
>
> [PATCH] Be sure that the media_device won't be freed too early
>
> This code snippet is untested.
>
> Signed-off-by: Mauro Carvalho chehab <mchehab@s-opensource.com>
>
> diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
> index 8756275e9fc4..5fdeab382069 100644
> --- a/drivers/media/media-device.c
> +++ b/drivers/media/media-device.c
> @@ -706,7 +706,7 @@ int __must_check __media_device_register(struct media_device *mdev,
>  	struct media_devnode *devnode;
>  	int ret;
>
> -	devnode = kzalloc(sizeof(*devnode), GFP_KERNEL);
> +	devnode = devm_kzalloc(mdev->dev, sizeof(*devnode), GFP_KERNEL);

I'm not sure about this change. I *think* this would work, but *only* if all
the refcounting is 100% correct, and we know it isn't at the moment. So I
think this should be postponed until we are confident everything is correct.

>  	if (!devnode)
>  		return -ENOMEM;
>
> diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
> index 8be561ab2615..14a3c56dbcac 100644
> --- a/drivers/media/v4l2-core/v4l2-dev.c
> +++ b/drivers/media/v4l2-core/v4l2-dev.c
> @@ -196,6 +196,7 @@ static void v4l2_device_release(struct device *cd)
>  #if defined(CONFIG_MEDIA_CONTROLLER)
>  	if (v4l2_dev->mdev) {
>  		/* Remove interfaces and interface links */
> +		put_device(v4l2_dev->mdev->dev);
>  		media_devnode_remove(vdev->intf_devnode);
>  		if (vdev->entity.function != MEDIA_ENT_F_UNKNOWN)
>  			media_device_unregister_entity(&vdev->entity);

I think this is the wrong order: put_device should go after media_device_unregister_entity().

> @@ -810,6 +811,7 @@ static int video_register_media_controller(struct video_device *vdev, int type)
>  			return -ENOMEM;
>  		}
>  	}
> +	get_device(vdev->v4l2_dev->dev);

You mean v4l2_dev->mdev->dev?

>
>  	/* FIXME: how to create the other interface links? */
>
> @@ -1015,6 +1017,11 @@ void video_unregister_device(struct video_device *vdev)
>  	if (!vdev || !video_is_registered(vdev))
>  		return;
>
> +#if defined(CONFIG_MEDIA_CONTROLLER)
> +	if (vdev->v4l2_dev->dev)
> +		put_device(vdev->v4l2_dev->dev);

Ditto.

> +#endif
> +
>  	mutex_lock(&videodev_lock);
>  	/* This must be in a critical section to prevent a race with v4l2_open.
>  	 * Once this bit has been cleared video_get may never be called again.
> diff --git a/drivers/media/v4l2-core/v4l2-device.c b/drivers/media/v4l2-core/v4l2-device.c
> index 62bbed76dbbc..53f42090c762 100644
> --- a/drivers/media/v4l2-core/v4l2-device.c
> +++ b/drivers/media/v4l2-core/v4l2-device.c
> @@ -188,6 +188,7 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
>  		err = media_device_register_entity(v4l2_dev->mdev, entity);
>  		if (err < 0)
>  			goto error_module;
> +		get_device(v4l2_dev->mdev->dev);
>  	}
>  #endif
>
> @@ -205,6 +206,8 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
>
>  error_unregister:
>  #if defined(CONFIG_MEDIA_CONTROLLER)
> +	if (v4l2_dev->mdev)
> +		put_device(v4l2_dev->mdev->dev);
>  	media_device_unregister_entity(entity);
>  #endif
>  error_module:
> @@ -310,6 +313,7 @@ void v4l2_device_unregister_subdev(struct v4l2_subdev *sd)
>  		 * links are removed by the function below, in the right order
>  		 */
>  		media_device_unregister_entity(&sd->entity);
> +		put_device(v4l2_dev->mdev->dev);
>  	}
>  #endif
>  	video_unregister_device(sd->devnode);
>
>
>
>

Regards,

	Hans

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-16 11:27                                               ` Hans Verkuil
@ 2016-12-16 12:00                                                 ` Mauro Carvalho Chehab
  2016-12-16 14:45                                                   ` Hans Verkuil
  0 siblings, 1 reply; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-16 12:00 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Shuah Khan, Laurent Pinchart, Sakari Ailus, Sakari Ailus, linux-media

Em 
 escreveu:

> On 16/12/16 11:57, Mauro Carvalho Chehab wrote:
> > Em Fri, 16 Dec 2016 11:11:25 +0100
> > Hans Verkuil <hverkuil@xs4all.nl> escreveu:
> >  
> >>> Would it make sense to enforce that dependency. Can we tie /dev/media usecount
> >>> to /dev/video etc. usecount? In other words:
> >>>
> >>> /dev/video is opened, then open /dev/media.  
> >>
> >> When a device node is registered it should increase the refcount on the media_device
> >> (as I proposed, that would be mdev->dev). When a device node is unregistered and the
> >> last user disappeared, then it can decrease the media_device refcount.
> >>
> >> So as long as anyone is using a device node, the media_device will stick around as
> >> well.
> >>
> >> No need to take refcounts on open/close.  
> >
> > That makes sense. You're meaning something like the enclosed (untested)
> > patch?
> >  
> >> One note: as I mentioned, the video_device does not set the cdev parent correctly,
> >> so that bug needs to be fixed first for this to work.  
> >
> > Actually, __video_register_device() seems to be setting the parent
> > properly:
> >
> > 	if (vdev->dev_parent == NULL)
> > 		vdev->dev_parent = vdev->v4l2_dev->dev;  
> 
> No, I mean this code (from cec-core.c):
> 
> 
>         /* Part 2: Initialize and register the character device */
>          cdev_init(&devnode->cdev, &cec_devnode_fops);
>          devnode->cdev.kobj.parent = &devnode->dev.kobj;
>          devnode->cdev.owner = owner;
> 
>          ret = cdev_add(&devnode->cdev, devnode->dev.devt, 1);
>          if (ret < 0) {
>                  pr_err("%s: cdev_add failed\n", __func__);
>                  goto clr_bit;
>          }
> 
>          ret = device_add(&devnode->dev);
>          if (ret)
>                  goto cdev_del;
> 
> which sets cdev.kobj.parent. And that would indeed be vdev->dev_parent.

Ah! So, you're basically proposing to have a separate struct for
V4L2 devnode as well, right?

Makes sense.

> 
> >
> > Thanks,
> > Mauro
> >
> > [PATCH] Be sure that the media_device won't be freed too early
> >
> > This code snippet is untested.
> >
> > Signed-off-by: Mauro Carvalho chehab <mchehab@s-opensource.com>
> >
> > diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
> > index 8756275e9fc4..5fdeab382069 100644
> > --- a/drivers/media/media-device.c
> > +++ b/drivers/media/media-device.c
> > @@ -706,7 +706,7 @@ int __must_check __media_device_register(struct media_device *mdev,
> >  	struct media_devnode *devnode;
> >  	int ret;
> >
> > -	devnode = kzalloc(sizeof(*devnode), GFP_KERNEL);
> > +	devnode = devm_kzalloc(mdev->dev, sizeof(*devnode), GFP_KERNEL);  
> 
> I'm not sure about this change. I *think* this would work, but *only* if all
> the refcounting is 100% correct, and we know it isn't at the moment. So I
> think this should be postponed until we are confident everything is correct.

Yes, such change will require first to be sure that drivers would be
doing the right thing.

> 
> >  	if (!devnode)
> >  		return -ENOMEM;
> >
> > diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
> > index 8be561ab2615..14a3c56dbcac 100644
> > --- a/drivers/media/v4l2-core/v4l2-dev.c
> > +++ b/drivers/media/v4l2-core/v4l2-dev.c
> > @@ -196,6 +196,7 @@ static void v4l2_device_release(struct device *cd)
> >  #if defined(CONFIG_MEDIA_CONTROLLER)
> >  	if (v4l2_dev->mdev) {
> >  		/* Remove interfaces and interface links */
> > +		put_device(v4l2_dev->mdev->dev);
> >  		media_devnode_remove(vdev->intf_devnode);
> >  		if (vdev->entity.function != MEDIA_ENT_F_UNKNOWN)
> >  			media_device_unregister_entity(&vdev->entity);  
> 
> I think this is the wrong order: put_device should go after media_device_unregister_entity().

OK.

> 
> > @@ -810,6 +811,7 @@ static int video_register_media_controller(struct video_device *vdev, int type)
> >  			return -ENOMEM;
> >  		}
> >  	}
> > +	get_device(vdev->v4l2_dev->dev);  
> 
> You mean v4l2_dev->mdev->dev?

Yes, that's right (vdev->v4l2_dev->mdev->dev).

> 
> >
> >  	/* FIXME: how to create the other interface links? */
> >
> > @@ -1015,6 +1017,11 @@ void video_unregister_device(struct video_device *vdev)
> >  	if (!vdev || !video_is_registered(vdev))
> >  		return;
> >
> > +#if defined(CONFIG_MEDIA_CONTROLLER)
> > +	if (vdev->v4l2_dev->dev)
> > +		put_device(vdev->v4l2_dev->dev);  
> 
> Ditto.
> 
> > +#endif
> > +
> >  	mutex_lock(&videodev_lock);
> >  	/* This must be in a critical section to prevent a race with v4l2_open.
> >  	 * Once this bit has been cleared video_get may never be called again.
> > diff --git a/drivers/media/v4l2-core/v4l2-device.c b/drivers/media/v4l2-core/v4l2-device.c
> > index 62bbed76dbbc..53f42090c762 100644
> > --- a/drivers/media/v4l2-core/v4l2-device.c
> > +++ b/drivers/media/v4l2-core/v4l2-device.c
> > @@ -188,6 +188,7 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
> >  		err = media_device_register_entity(v4l2_dev->mdev, entity);
> >  		if (err < 0)
> >  			goto error_module;
> > +		get_device(v4l2_dev->mdev->dev);
> >  	}
> >  #endif
> >
> > @@ -205,6 +206,8 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
> >
> >  error_unregister:
> >  #if defined(CONFIG_MEDIA_CONTROLLER)
> > +	if (v4l2_dev->mdev)
> > +		put_device(v4l2_dev->mdev->dev);
> >  	media_device_unregister_entity(entity);
> >  #endif
> >  error_module:
> > @@ -310,6 +313,7 @@ void v4l2_device_unregister_subdev(struct v4l2_subdev *sd)
> >  		 * links are removed by the function below, in the right order
> >  		 */
> >  		media_device_unregister_entity(&sd->entity);
> > +		put_device(v4l2_dev->mdev->dev);
> >  	}
> >  #endif
> >  	video_unregister_device(sd->devnode);
> >
> >
> >
> >  
> 
> Regards,
> 
> 	Hans



Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management
  2016-12-15 11:23   ` Laurent Pinchart
  2016-12-15 11:39     ` Sakari Ailus
@ 2016-12-16 13:32     ` Sakari Ailus
  2016-12-16 14:39       ` Shuah Khan
  1 sibling, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2016-12-16 13:32 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Sakari Ailus, linux-media, hverkuil, mchehab, shuahkh

Hi Laurent,

On Thu, Dec 15, 2016 at 01:23:50PM +0200, Laurent Pinchart wrote:
> > @@ -1596,7 +1604,6 @@ static void isp_unregister_entities(struct isp_device
> > *isp) omap3isp_stat_unregister_entities(&isp->isp_af);
> >  	omap3isp_stat_unregister_entities(&isp->isp_hist);
> > 
> > -	v4l2_device_unregister(&isp->v4l2_dev);
> 
> This isn't correct. The v4l2_device instance should be unregistered here, to 
> make sure that the subdev nodes are unregistered too. And even if moving the 
> function call was correct, it should be done in a separate patch as it's 
> unrelated to $SUBJECT.

I think I tried to fix another problem here we haven't considered before,
which is that v4l2_device_unregister() also unregisters the entities through
media_device_unregister_entity(). This will set the media device of the
graph objects NULL.

I'll see whether something could be done to that.

-- 
Regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management
  2016-12-16 13:32     ` Sakari Ailus
@ 2016-12-16 14:39       ` Shuah Khan
  0 siblings, 0 replies; 89+ messages in thread
From: Shuah Khan @ 2016-12-16 14:39 UTC (permalink / raw)
  To: Sakari Ailus, Laurent Pinchart
  Cc: Sakari Ailus, linux-media, hverkuil, mchehab, Shuah Khan

On 12/16/2016 06:32 AM, Sakari Ailus wrote:
> Hi Laurent,
> 
> On Thu, Dec 15, 2016 at 01:23:50PM +0200, Laurent Pinchart wrote:
>>> @@ -1596,7 +1604,6 @@ static void isp_unregister_entities(struct isp_device
>>> *isp) omap3isp_stat_unregister_entities(&isp->isp_af);
>>>  	omap3isp_stat_unregister_entities(&isp->isp_hist);
>>>
>>> -	v4l2_device_unregister(&isp->v4l2_dev);
>>
>> This isn't correct. The v4l2_device instance should be unregistered here, to 
>> make sure that the subdev nodes are unregistered too. And even if moving the 
>> function call was correct, it should be done in a separate patch as it's 
>> unrelated to $SUBJECT.
> 
> I think I tried to fix another problem here we haven't considered before,
> which is that v4l2_device_unregister() also unregisters the entities through
> media_device_unregister_entity(). This will set the media device of the
> graph objects NULL.
> 
> I'll see whether something could be done to that.
> 

Right That is what I was pointing out, when I said the cleanup routines are
done in the wrong place. Entity registration and unregistration are distributed
in nature. v4l2 register will register entities and unregister will unregister
its entities. dvb will do the same.

So essentially entities get added and removed when any of these drivers get
unbound. Please see the following I posted on

[RFC v3 00/21] Make use of kref in media device, grab references as needed

> v4l2-core registers and unregisters entities and so does dvb-core. So when a
> driver unregisters video and dvb, these entities get deleted. So we have a
> distributed mode of registering and unregistering entities. We also have
> ioctls (video, dvb, and media) accessing these entities. So where do we make
> changes to ensure entities stick around until all users exit?
>
> Ref-counting entities won't work if they are embedded - like in the case of
> struct video_device which embeds the media entity. When struct video goes
> away then entity will disappear. So we do have a complex lifetime model here
> that we need to figure out how to fix.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-16 12:00                                                 ` Mauro Carvalho Chehab
@ 2016-12-16 14:45                                                   ` Hans Verkuil
  2016-12-19  9:28                                                     ` Media summit in Feb? - Was: " Mauro Carvalho Chehab
  0 siblings, 1 reply; 89+ messages in thread
From: Hans Verkuil @ 2016-12-16 14:45 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Shuah Khan, Laurent Pinchart, Sakari Ailus, Sakari Ailus, linux-media

On 16/12/16 13:00, Mauro Carvalho Chehab wrote:
> Em
>  escreveu:
>
>> On 16/12/16 11:57, Mauro Carvalho Chehab wrote:
>>> Em Fri, 16 Dec 2016 11:11:25 +0100
>>> Hans Verkuil <hverkuil@xs4all.nl> escreveu:
>>>
>>>>> Would it make sense to enforce that dependency. Can we tie /dev/media usecount
>>>>> to /dev/video etc. usecount? In other words:
>>>>>
>>>>> /dev/video is opened, then open /dev/media.
>>>>
>>>> When a device node is registered it should increase the refcount on the media_device
>>>> (as I proposed, that would be mdev->dev). When a device node is unregistered and the
>>>> last user disappeared, then it can decrease the media_device refcount.
>>>>
>>>> So as long as anyone is using a device node, the media_device will stick around as
>>>> well.
>>>>
>>>> No need to take refcounts on open/close.
>>>
>>> That makes sense. You're meaning something like the enclosed (untested)
>>> patch?
>>>
>>>> One note: as I mentioned, the video_device does not set the cdev parent correctly,
>>>> so that bug needs to be fixed first for this to work.
>>>
>>> Actually, __video_register_device() seems to be setting the parent
>>> properly:
>>>
>>> 	if (vdev->dev_parent == NULL)
>>> 		vdev->dev_parent = vdev->v4l2_dev->dev;
>>
>> No, I mean this code (from cec-core.c):
>>
>>
>>         /* Part 2: Initialize and register the character device */
>>          cdev_init(&devnode->cdev, &cec_devnode_fops);
>>          devnode->cdev.kobj.parent = &devnode->dev.kobj;
>>          devnode->cdev.owner = owner;
>>
>>          ret = cdev_add(&devnode->cdev, devnode->dev.devt, 1);
>>          if (ret < 0) {
>>                  pr_err("%s: cdev_add failed\n", __func__);
>>                  goto clr_bit;
>>          }
>>
>>          ret = device_add(&devnode->dev);
>>          if (ret)
>>                  goto cdev_del;
>>
>> which sets cdev.kobj.parent. And that would indeed be vdev->dev_parent.
>
> Ah! So, you're basically proposing to have a separate struct for
> V4L2 devnode as well, right?
>
> Makes sense.

No need for that, that's already struct video_device.

>
>>
>>>
>>> Thanks,
>>> Mauro
>>>
>>> [PATCH] Be sure that the media_device won't be freed too early
>>>
>>> This code snippet is untested.
>>>
>>> Signed-off-by: Mauro Carvalho chehab <mchehab@s-opensource.com>
>>>
>>> diff --git a/drivers/media/media-device.c b/drivers/media/media-device.c
>>> index 8756275e9fc4..5fdeab382069 100644
>>> --- a/drivers/media/media-device.c
>>> +++ b/drivers/media/media-device.c
>>> @@ -706,7 +706,7 @@ int __must_check __media_device_register(struct media_device *mdev,
>>>  	struct media_devnode *devnode;
>>>  	int ret;
>>>
>>> -	devnode = kzalloc(sizeof(*devnode), GFP_KERNEL);
>>> +	devnode = devm_kzalloc(mdev->dev, sizeof(*devnode), GFP_KERNEL);
>>
>> I'm not sure about this change. I *think* this would work, but *only* if all
>> the refcounting is 100% correct, and we know it isn't at the moment. So I
>> think this should be postponed until we are confident everything is correct.
>
> Yes, such change will require first to be sure that drivers would be
> doing the right thing.

So devm_ resources are released right after remove() exits, not when the last reference
goes to 0. In other words, devm_ typically can't be used for these complex scenarios, 
certainly not for memory. See discussion with Laurent on irc.

>
>>
>>>  	if (!devnode)
>>>  		return -ENOMEM;
>>>
>>> diff --git a/drivers/media/v4l2-core/v4l2-dev.c b/drivers/media/v4l2-core/v4l2-dev.c
>>> index 8be561ab2615..14a3c56dbcac 100644
>>> --- a/drivers/media/v4l2-core/v4l2-dev.c
>>> +++ b/drivers/media/v4l2-core/v4l2-dev.c
>>> @@ -196,6 +196,7 @@ static void v4l2_device_release(struct device *cd)
>>>  #if defined(CONFIG_MEDIA_CONTROLLER)
>>>  	if (v4l2_dev->mdev) {
>>>  		/* Remove interfaces and interface links */
>>> +		put_device(v4l2_dev->mdev->dev);
>>>  		media_devnode_remove(vdev->intf_devnode);
>>>  		if (vdev->entity.function != MEDIA_ENT_F_UNKNOWN)
>>>  			media_device_unregister_entity(&vdev->entity);
>>
>> I think this is the wrong order: put_device should go after media_device_unregister_entity().
>
> OK.
>
>>
>>> @@ -810,6 +811,7 @@ static int video_register_media_controller(struct video_device *vdev, int type)
>>>  			return -ENOMEM;
>>>  		}
>>>  	}
>>> +	get_device(vdev->v4l2_dev->dev);
>>
>> You mean v4l2_dev->mdev->dev?
>
> Yes, that's right (vdev->v4l2_dev->mdev->dev).

Laurent helped me realize that this won't work either: mdev->dev is typically a 
platform/pci/usb device, and that won't go away when you rmmod the driver.

So while taking a refcount on that device doesn't hurt, we also need to take a refcount
on a kref inside the mdev. Just like v4l2_device this struct has an unfortunate name.
It's not a device, but a root structure for media devices.

We really need a whiteboard for this :-(

Regards,

	Hans

>
>>
>>>
>>>  	/* FIXME: how to create the other interface links? */
>>>
>>> @@ -1015,6 +1017,11 @@ void video_unregister_device(struct video_device *vdev)
>>>  	if (!vdev || !video_is_registered(vdev))
>>>  		return;
>>>
>>> +#if defined(CONFIG_MEDIA_CONTROLLER)
>>> +	if (vdev->v4l2_dev->dev)
>>> +		put_device(vdev->v4l2_dev->dev);
>>
>> Ditto.
>>
>>> +#endif
>>> +
>>>  	mutex_lock(&videodev_lock);
>>>  	/* This must be in a critical section to prevent a race with v4l2_open.
>>>  	 * Once this bit has been cleared video_get may never be called again.
>>> diff --git a/drivers/media/v4l2-core/v4l2-device.c b/drivers/media/v4l2-core/v4l2-device.c
>>> index 62bbed76dbbc..53f42090c762 100644
>>> --- a/drivers/media/v4l2-core/v4l2-device.c
>>> +++ b/drivers/media/v4l2-core/v4l2-device.c
>>> @@ -188,6 +188,7 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
>>>  		err = media_device_register_entity(v4l2_dev->mdev, entity);
>>>  		if (err < 0)
>>>  			goto error_module;
>>> +		get_device(v4l2_dev->mdev->dev);
>>>  	}
>>>  #endif
>>>
>>> @@ -205,6 +206,8 @@ int v4l2_device_register_subdev(struct v4l2_device *v4l2_dev,
>>>
>>>  error_unregister:
>>>  #if defined(CONFIG_MEDIA_CONTROLLER)
>>> +	if (v4l2_dev->mdev)
>>> +		put_device(v4l2_dev->mdev->dev);
>>>  	media_device_unregister_entity(entity);
>>>  #endif
>>>  error_module:
>>> @@ -310,6 +313,7 @@ void v4l2_device_unregister_subdev(struct v4l2_subdev *sd)
>>>  		 * links are removed by the function below, in the right order
>>>  		 */
>>>  		media_device_unregister_entity(&sd->entity);
>>> +		put_device(v4l2_dev->mdev->dev);
>>>  	}
>>>  #endif
>>>  	video_unregister_device(sd->devnode);
>>>
>>>
>>>
>>>
>>
>> Regards,
>>
>> 	Hans
>
>
>
> Thanks,
> Mauro
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 14:03                           ` Hans Verkuil
  2016-12-15 14:32                             ` Mauro Carvalho Chehab
  2016-12-15 14:45                             ` Shuah Khan
@ 2016-12-16 15:07                             ` Sakari Ailus
  2016-12-16 16:34                               ` Laurent Pinchart
  2016-12-19  9:46                               ` Mauro Carvalho Chehab
  2 siblings, 2 replies; 89+ messages in thread
From: Sakari Ailus @ 2016-12-16 15:07 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Laurent Pinchart, Mauro Carvalho Chehab, Shuah Khan,
	Sakari Ailus, linux-media

Hi Hans,

On Thu, Dec 15, 2016 at 03:03:36PM +0100, Hans Verkuil wrote:
> On 15/12/16 13:56, Laurent Pinchart wrote:
> >Hi Sakari,
> >
> >On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> >>On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
> >>>Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> >>>>On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> >>>>>Hi Sakari,
> >>>>>
> >>>>>I answered you point to point below, but I suspect that you missed how
> >>>>>the current approach works. So, I decided to write a quick summary
> >>>>>here.
> >>>>>
> >>>>>The character devices /dev/media? are created via cdev, with relies on
> >>>>>a kobject per device, with has an embedded struct kref inside.
> >>>>>
> >>>>>Also, each kobj at /dev/media0, /dev/media1, ... is associated with a
> >>>>>struct device, when the code does:
> >>>>>  devnode->cdev.kobj.parent = &devnode->dev.kobj;
> >>>>>
> >>>>>before calling cdev_add().
> >>>>>
> >>>>>The current lifetime management is actually based on cdev's kobject's
> >>>>>refcount, provided by its embedded kref.
> >>>>>
> >>>>>The kref warrants that any data associated with /dev/media0 won't be
> >>>>>freed if there are any pending system call. In other words, when
> >>>>>cdev_del() is called, it will remove /dev/media0 from the filesystem,
> >>>>>and will call kobject_put().
> >>>>>
> >>>>>If the refcount is zero, it will call devnode->dev.release(). If the
> >>>>>kobject refcount is not zero, the data won't be freed.
> >>>>>
> >>>>>So, in the best case scenario, there's no opened file descriptors
> >>>>>by the time media device node is unregistered. So, it will free
> >>>>>everything.
> >>>>>
> >>>>>In the worse case scenario, e. g. when the driver is removed or
> >>>>>unbind while /dev/media0 has some opened file descriptor(s),
> >>>>>the cdev logic will do the proper lifetime management.
> >>>>>
> >>>>>On such case, /dev/media0 disappears from the file system, so another
> >>>>>open is not possible anymore. The data structures will remain
> >>>>>allocated until all associated file descriptors are not closed.
> >>>>>
> >>>>>When all file descriptors are closed, the data will be freed.
> >>>>>
> >>>>>On that time, it will call an optional dev.release() callback,
> >>>>>responsible to free any other data struct that the driver allocated.
> >>>>
> >>>>The patchset does not change this. It's not a question of the
> >>>>media_devnode struct either. That's not an issue.
> >>>>
> >>>>The issue is rather what else can be accessed through the media device
> >>>>and other interfaces. As IOCTLs are not serialised with device removal
> >>>>(which now releases much of the data structures)
> >>>
> >>>Huh? ioctls are serialized with struct device removal. The Driver core
> >>>warrants that.
> >>
> >>How?
> >>
> >>As far as I can tell, there's nothing in the way of an IOCTL being in
> >>progress on a character device which is registered by the driver for a
> >>hardware device which is being removed.
> >>
> >>vfs_ioctl() directly calls the unlocked_ioctl() file operation which is, in
> >>case of MC, media_ioctl() in media-devnode.c. No mutexes (or other locks)
> >>are taken during that path, which I believe is by design.
> 
> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
> on release(). Thus ensuring that the cdev can never be removed while in an
> ioctl.

It does, but it does not affect memory which is allocated separately of that.

See this:

<URL:https://www.mail-archive.com/linux-media@vger.kernel.org/msg106390.html>

> 
> >>
> >>>>there's a high chance of accessing
> >>>>released memory (or mutexes that have been already destroyed). An
> >>>>example of that is here, stopping a running pipeline after unbinding
> >>>>the device. What happens there is that the media device is released
> >>>>whilst it's in use through the video device.
> >>>>
> >>>><URL:http://www.retiisi.org.uk/v4l2/tmp/media-ref-dmesg2.txt>
> >>>
> >>>It is not clear from the logs what the driver tried to do, but
> >>>that sounds like a driver's bug, with was not prepared to properly
> >>>handle unbinds.
> >>>
> >>>The problem here is that isp_video_release() is called by V4L2
> >>>release logic, and not by the MC one:
> >>>
> >>>static const struct v4l2_file_operations isp_video_fops = {
> >>>      .owner          = THIS_MODULE,
> >>>      .open           = isp_video_open,
> >>>      .release        = isp_video_release,
> >>>      .poll           = vb2_fop_poll,
> >>>      .unlocked_ioctl = video_ioctl2,
> >>>      .mmap           = vb2_fop_mmap,
> >>>};
> >>>
> >>>It seems that the driver's logic allows it to be called before or
> >>>after destroying the MC.
> >>>
> >>>Assuming that, if the OMAP3 driver is not used it works,
> >>>it means that, if the isp_video_release() is called
> >>>first, no errors will happen, but if MC is destroyed before
> >>>V4L2 call to its .release() callback, as there's no logic at the
> >>>driver that would detect it, isp_video_release() will be calling
> >>>isp_video_streamoff(), with depends on the MC to work.
> >>>
> >>>On a first glance, I can see two ways of fixing it:
> >>>
> >>>1) to increment devnode's device kobject refcount at OMAP3 .probe(),
> >>>decrementing it only at isp_video_release(). That will ensure that
> >>>MC will only be removed after V4L2 removal.
> >
> >As soon as you have to dig deep in a structure to find a reference counter and
> >increment it, bypassing all the API layers, you can be entirely sure that the
> >solution is wrong.
> 
> Indeed.
> 
> >
> >>>2) to call isp_video_streamoff() before removing the MC stuff, e. g.
> >>>inside the MC .release() callback.
> >>
> >>This is a fair suggestion, indeed. Let me see what could be done there.
> >>Albeit this is just *one* of the existing issues. It will not address all
> >>problems fixed by the patchset.
> >
> >We need to stop the hardware at .remove() time. That should not be linked to a
> >videodev, v4l2_device or media_device .release() callback. When the .remove()
> >callback returns the driver is not allowed to touch the hardware anymore. In
> >particular, power domains might clocks or power supplies, leading to invalid
> >access faults if we try to access hardware registers.
> 
> Correct.
> 
> >
> >USB devices get help from the USB core that cancels all USB operations in
> >progress when they're disconnected. Platform devices don't have it as easy,
> >and need to implement everything themselves. We thus need to stop the
> >hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF ioctl at
> >.remove() time.
> 
> Please don't. This shouldn't be done automatically.
> 
> > That could introduce other races between .remove() and the
> >userspace API. A better solution is to make sure the objects that are needed
> >at .release() time of the device node are all reference-counted and only
> >released when the last reference goes away.
> >
> >There's plenty of way to try and work around the problem in drivers, some more
> >racy than others, but if we require changes to all platform drivers to fix
> >this we need to ensure that we get it right, not as half-baked hacks spread
> >around the whole subsystem.
> 
> Why on earth do we want this for the omap3 driver? It is not a hot-pluggable
> device and I see no reason whatsoever to start modifying platform drivers just
> because you can do an unbind. I know there are real hot-pluggable devices, and
> getting this right for those is of course important.
> 
> If the omap3 is used as a testbed, then that's fine by me, but even then I
> probably wouldn't want the omap3 code that makes this possible in the kernel.
> It's just additional code for no purpose.

The same problems exist on other devices, whether platform, pci or USB, as
the problems are in the core frameworks rather than (only) in the drivers.

On platform devices, this happens also when removing the module.

I've used omap3isp as an example since it demonstrates well the problems and
a lot of people have the hardware as well. Also, Mauro has requested all
drivers to be converted to the new API. I'm fine doing that for the actually
hot-pluggable hardware.

One additional reason is that as the omap3isp driver has been used as an
example to write a number of other drivers, people do see what's the right
way to do these things, instead of copying code from a driver doing it
wrong.

> 
> >>>That could be done by overwriting the dev.release() callback at
> >>>omap3 driver, as I discussed on my past e-mails, and flagging the
> >>>driver that it should not accept streamon anymore, as the hardware
> >>>is being disconnecting.
> >>
> >>A mutex will be needed to serialise the this with starting streaming.
> >>
> >>>Btw, that explains a lot why Shuah can't reproduce the stuff you're
> >>>complaining on her USB hardware.
> >>>
> >>>The USB subsystem has a a .disconnect() callback that notifies
> >>>the drivers that a device was unbound (likely physically removed).
> >>>The way USB media drivers handle it is by returning -ENODEV to any
> >>>V4L2 call that would try to touch at the hardware after unbound.
> >
> 
> In my view the main problem is that the media core is bound to a struct
> device set by the driver that creates the MC. But since the MC gives an
> overview of lots of other (sub)devices the refcount of the media device
> should be increased for any (sub)device that adds itself to the MC and
> decreased for any (sub)device that is removed. Only when the very last
> user goes away can the MC memory be released.

Agreed.

> 
> The memory/refcounting associated with device nodes is unrelated to this:
> once a devnode is unregistered it will be removed in /dev, and once the
> last open fh closes any memory associated with the devnode can be released.
> That will also decrease the refcount to its parent device.
> 
> This also means that it is a bad idea to embed devnodes in a larger struct.
> They should be allocated and freed when the devnode is unregistered and
> the last open filehandle is closed.

We do have a release() callback for video_device but not for media_device.

> 
> Then the parent's device refcount is decreased, and that may now call its
> release callback if the refcount reaches 0.
> 
> For the media controller's device: any other device driver that needs access
> to it needs to increase that device's refcount, and only when those devices
> are released will they decrease the MC device's refcount.
> 
> And when that refcount goes to 0 can we finally free everything.
> 
> With regards to the opposition to reverting those initial patches, I'm
> siding with Greg KH. Just revert the bloody patches. It worked most of the
> time before those patches, so reverting really won't cause bisect problems.
> 
> Just revert and build up things as they should.
> 
> Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
> parent pointer for example) and many drivers (including omap3isp) embed
> video_device, which is wrong and can lead to complications.
> 
> I'm to blame for the embedding since I thought that was a good idea at one
> time. I now realized that it isn't. Sorry about that...
> 
> And because the cdev of the video_device doesn't know about the parent
> device it is (I think) possible that the parent device is released before
> the cdev is released. Which can result in major problems.

Is embedding cdev really such a problem? Is there a problem you can solve by
not embedding it?

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-16 15:07                             ` Sakari Ailus
@ 2016-12-16 16:34                               ` Laurent Pinchart
  2016-12-19  9:46                               ` Mauro Carvalho Chehab
  1 sibling, 0 replies; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-16 16:34 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Sakari Ailus, Mauro Carvalho Chehab, Shuah Khan, Sakari Ailus,
	linux-media

Hi Hans,

On Friday 16 Dec 2016 17:07:23 Sakari Ailus wrote:
> On Thu, Dec 15, 2016 at 03:03:36PM +0100, Hans Verkuil wrote:
> > On 15/12/16 13:56, Laurent Pinchart wrote:
> >> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> >>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:

[snip]

> >>>> It is not clear from the logs what the driver tried to do, but
> >>>> that sounds like a driver's bug, with was not prepared to properly
> >>>> handle unbinds.
> >>>>
> >>>> The problem here is that isp_video_release() is called by V4L2
> >>>> release logic, and not by the MC one:
> >>>>
> >>>> static const struct v4l2_file_operations isp_video_fops = {
> >>>>      .owner          = THIS_MODULE,
> >>>>      .open           = isp_video_open,
> >>>>      .release        = isp_video_release,
> >>>>      .poll           = vb2_fop_poll,
> >>>>      .unlocked_ioctl = video_ioctl2,
> >>>>      .mmap           = vb2_fop_mmap,
> >>>>};
> >>>>
> >>>> It seems that the driver's logic allows it to be called before or
> >>>> after destroying the MC.
> >>>>
> >>>> Assuming that, if the OMAP3 driver is not used it works,
> >>>> it means that, if the isp_video_release() is called
> >>>> first, no errors will happen, but if MC is destroyed before
> >>>> V4L2 call to its .release() callback, as there's no logic at the
> >>>> driver that would detect it, isp_video_release() will be calling
> >>>> isp_video_streamoff(), with depends on the MC to work.
> >>>>
> >>>> On a first glance, I can see two ways of fixing it:
> >>>>
> >>>> 1) to increment devnode's device kobject refcount at OMAP3 .probe(),
> >>>> decrementing it only at isp_video_release(). That will ensure that
> >>>> MC will only be removed after V4L2 removal.
> >>
> >> As soon as you have to dig deep in a structure to find a reference
> >> counter and increment it, bypassing all the API layers, you can be
> >> entirely sure that the solution is wrong.
> > 
> > Indeed.
> > 
> >>>> 2) to call isp_video_streamoff() before removing the MC stuff, e. g.
> >>>> inside the MC .release() callback.
> >>>
> >>> This is a fair suggestion, indeed. Let me see what could be done there.
> >>> Albeit this is just *one* of the existing issues. It will not address
> >>> all problems fixed by the patchset.
> >>
> >> We need to stop the hardware at .remove() time. That should not be linked
> >> to a videodev, v4l2_device or media_device .release() callback. When the
> >> .remove() callback returns the driver is not allowed to touch the
> >> hardware anymore. In particular, power domains might clocks or power
> >> supplies, leading to invalid access faults if we try to access hardware
> >> registers.
> > 
> > Correct.
> > 
> >> USB devices get help from the USB core that cancels all USB operations
> >> in progress when they're disconnected. Platform devices don't have it as
> >> easy, and need to implement everything themselves. We thus need to stop
> >> the hardware, but I'm not sure it makes sense to fake a VIDIOC_STREAMOFF
> >> ioctl at .remove() time.
> > 
> > Please don't. This shouldn't be done automatically.
> > 
> >> That could introduce other races between .remove() and the
> >> userspace API. A better solution is to make sure the objects that are
> >> needed at .release() time of the device node are all reference-counted
> >> and only released when the last reference goes away.
> >>
> >> There's plenty of way to try and work around the problem in drivers, some
> >> more racy than others, but if we require changes to all platform drivers
> >> to fix this we need to ensure that we get it right, not as half-baked
> >> hacks spread around the whole subsystem.
> > 
> > Why on earth do we want this for the omap3 driver? It is not a
> > hot-pluggable device and I see no reason whatsoever to start modifying
> > platform drivers just because you can do an unbind. I know there are real
> > hot-pluggable devices, and getting this right for those is of course
> > important.
> > 
> > If the omap3 is used as a testbed, then that's fine by me, but even then I
> > probably wouldn't want the omap3 code that makes this possible in the
> > kernel. It's just additional code for no purpose.
> 
> The same problems exist on other devices, whether platform, pci or USB, as
> the problems are in the core frameworks rather than (only) in the drivers.
> 
> On platform devices, this happens also when removing the module.
> 
> I've used omap3isp as an example since it demonstrates well the problems and
> a lot of people have the hardware as well. Also, Mauro has requested all
> drivers to be converted to the new API. I'm fine doing that for the
> actually hot-pluggable hardware.
> 
> One additional reason is that as the omap3isp driver has been used as an
> example to write a number of other drivers, people do see what's the right
> way to do these things, instead of copying code from a driver doing it
> wrong.

That's a very important reason in my opinion. If we design core code properly 
it shouldn't be difficult to handle unbind correctly in drivers (I'd argue 
that properly designed core code should make it easier to implement drivers 
correctly than incorrectly, but that's hard to achieve). While we might not 
want to fix all platform device drivers, we need a few good examples, and we 
should push back on new drivers than implement unbind in a racy way.

> >>>> That could be done by overwriting the dev.release() callback at
> >>>> omap3 driver, as I discussed on my past e-mails, and flagging the
> >>>> driver that it should not accept streamon anymore, as the hardware
> >>>> is being disconnecting.
> >>>
> >>> A mutex will be needed to serialise the this with starting streaming.
> >>>
> >>>> Btw, that explains a lot why Shuah can't reproduce the stuff you're
> >>>> complaining on her USB hardware.
> >>>>
> >>>> The USB subsystem has a a .disconnect() callback that notifies
> >>>> the drivers that a device was unbound (likely physically removed).
> >>>> The way USB media drivers handle it is by returning -ENODEV to any
> >>>> V4L2 call that would try to touch at the hardware after unbound.
> > 
> > In my view the main problem is that the media core is bound to a struct
> > device set by the driver that creates the MC. But since the MC gives an
> > overview of lots of other (sub)devices the refcount of the media device
> > should be increased for any (sub)device that adds itself to the MC and
> > decreased for any (sub)device that is removed. Only when the very last
> > user goes away can the MC memory be released.
> 
> Agreed.

When storing a pointer to the media device anywhere, we need to make sure we 
hold a reference. There are two ways to do this, either by borrowing a 
reference or taking a new reference. Borrowing a reference is only valid if we 
know it will exist for at least as long as we need to borrow it. In most cases 
I expect we will need to take new references, but borrowing one should in my 
opinion be allowed where applicable. It should, however, always be accompanied 
by a comment that explains why the reference can be borrowed.

> > The memory/refcounting associated with device nodes is unrelated to this:
> > once a devnode is unregistered it will be removed in /dev, and once the
> > last open fh closes any memory associated with the devnode can be
> > released. That will also decrease the refcount to its parent device.
> > 
> > This also means that it is a bad idea to embed devnodes in a larger
> > struct. They should be allocated and freed when the devnode is
> > unregistered and the last open filehandle is closed.
> 
> We do have a release() callback for video_device but not for media_device.
> 
> > Then the parent's device refcount is decreased, and that may now call its
> > release callback if the refcount reaches 0.
> > 
> > For the media controller's device: any other device driver that needs
> > access to it needs to increase that device's refcount, and only when
> > those devices are released will they decrease the MC device's refcount.

I'm not sure to follow you here. What do you mean by media controller's device 
? A struct media_device instance ? The struct device pointer stored in the 
struct media_device instance ? The struct media_devnode associated with the 
struct media_device ? The struct device embedded in that struct media_devnode 
? There's lots of devices, could you please clarify by rewriting the 
explanation using structure and field names ?

> > And when that refcount goes to 0 can we finally free everything.
> > 
> > With regards to the opposition to reverting those initial patches, I'm
> > siding with Greg KH. Just revert the bloody patches. It worked most of the
> > time before those patches, so reverting really won't cause bisect
> > problems.
> > 
> > Just revert and build up things as they should.
> > 
> > Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
> > parent pointer for example)

Would you like to submit a patch ? I think it could be merged independently of 
any big rework.

> > and many drivers (including omap3isp) embed video_device, which is wrong
> > and can lead to complications.
> > 
> > I'm to blame for the embedding since I thought that was a good idea at one
> > time. I now realized that it isn't. Sorry about that...

I'm not sure to see why that's wrong. Embedding a struct video_device requires 
the driver to implement a .release() callback (which the omap3isp driver 
doesn't, and that's a bug). Are there other issues ? What is your concern ?

> > And because the cdev of the video_device doesn't know about the parent
> > device it is (I think) possible that the parent device is released before
> > the cdev is released. Which can result in major problems.
> 
> Is embedding cdev really such a problem? Is there a problem you can solve by
> not embedding it?

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 14:32                             ` Mauro Carvalho Chehab
  2016-12-15 14:45                               ` Hans Verkuil
@ 2016-12-16 16:43                               ` Laurent Pinchart
  1 sibling, 0 replies; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-16 16:43 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Hans Verkuil, Sakari Ailus, Shuah Khan, Sakari Ailus, linux-media

Hi Mauro,

On Thursday 15 Dec 2016 12:32:07 Mauro Carvalho Chehab wrote:
> Em Thu, 15 Dec 2016 15:03:36 +0100 Hans Verkuil escreveu:
> > On 15/12/16 13:56, Laurent Pinchart wrote:
> >> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> >>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
> >>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> >>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:
> >>>>>> Hi Sakari,
> >> 
> >> There's plenty of way to try and work around the problem in drivers,
> >> some more racy than others, but if we require changes to all platform
> >> drivers to fix this we need to ensure that we get it right, not as
> >> half-baked hacks spread around the whole subsystem.
> > 
> > Why on earth do we want this for the omap3 driver? It is not a
> > hot-pluggable device and I see no reason whatsoever to start modifying
> > platform drivers just because you can do an unbind. I know there are real
> > hot-pluggable devices, and getting this right for those is of course
> > important.
> 
> That's indeed a very good point. If unbind is not needed by any usecase,
> the better fix for OMAP3 would be to just prevent it to happen in the first
> place.

There are several reasons to implement proper unbind support in the omap3isp 
driver. Sakari has outlined them in another e-mail in this thread, I won't 
copy them here to avoid splitting the discussion.

> >>>> The USB subsystem has a a .disconnect() callback that notifies
> >>>> the drivers that a device was unbound (likely physically removed).
> >>>> The way USB media drivers handle it is by returning -ENODEV to any
> >>>> V4L2 call that would try to touch at the hardware after unbound.
> > 
> > In my view the main problem is that the media core is bound to a struct
> > device set by the driver that creates the MC. But since the MC gives an
> > overview of lots of other (sub)devices the refcount of the media device
> > should be increased for any (sub)device that adds itself to the MC and
> > decreased for any (sub)device that is removed. Only when the very last
> > user goes away can the MC memory be released.
> > 
> > The memory/refcounting associated with device nodes is unrelated to this:
> > once a devnode is unregistered it will be removed in /dev, and once the
> > last open fh closes any memory associated with the devnode can be
> > released. That will also decrease the refcount to its parent device.
> > 
> > This also means that it is a bad idea to embed devnodes in a larger
> > struct. They should be allocated and freed when the devnode is
> > unregistered and the last open filehandle is closed.
> > 
> > Then the parent's device refcount is decreased, and that may now call its
> > release callback if the refcount reaches 0.
> > 
> > For the media controller's device: any other device driver that needs
> > access to it needs to increase that device's refcount, and only when
> > those devices are released will they decrease the MC device's refcount.
> > 
> > And when that refcount goes to 0 can we finally free everything.
> > 
> > With regards to the opposition to reverting those initial patches, I'm
> > siding with Greg KH. Just revert the bloody patches. It worked most of the
> > time before those patches, so reverting really won't cause bisect
> > problems.
> 
> You're contradicting yourself here ;)
> 
> The patches that this patch series is reverting are the ones that
> de-embeeds devnode struct and fixes its lifecycle.
>
> Reverting those patches will cause regressions on hot-pluggable drivers,
> preventing them to be unplugged. So, if we're willing to revert, then we
> should also revert MC support on them.
> 
> > Just revert and build up things as they should.
> > 
> > Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
> > parent pointer for example) and many drivers (including omap3isp) embed
> > video_device, which is wrong and can lead to complications.
> > 
> > I'm to blame for the embedding since I thought that was a good idea at one
> > time. I now realized that it isn't. Sorry about that...
> > 
> > And because the cdev of the video_device doesn't know about the parent
> > device it is (I think) possible that the parent device is released before
> > the cdev is released. Which can result in major problems.
> 
> I agree with you here. IMHO, de-embeeding cdev's struct from video_device
> seems to be the right thing to do at V4L2 side too.

I believe Hans' comment about embedded devnodes in larger structures referred 
to embedded video_device and media_device inside driver private structures. 
And even in that case I'm not convinced. I've replied to that in another part 
of the mail thread, let's keep the discussion there.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 15:45                                 ` Mauro Carvalho Chehab
  2016-12-15 16:07                                   ` Hans Verkuil
@ 2016-12-16 16:47                                   ` Laurent Pinchart
  1 sibling, 0 replies; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-16 16:47 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Hans Verkuil, Sakari Ailus, Shuah Khan, Sakari Ailus, linux-media

On Thursday 15 Dec 2016 13:45:52 Mauro Carvalho Chehab wrote:
> Em Thu, 15 Dec 2016 15:45:22 +0100
> 
> Hans Verkuil <hverkuil@xs4all.nl> escreveu:
> > On 15/12/16 15:32, Mauro Carvalho Chehab wrote:
> > > Em Thu, 15 Dec 2016 15:03:36 +0100
> > > 
> > > Hans Verkuil <hverkuil@xs4all.nl> escreveu:
> > >> On 15/12/16 13:56, Laurent Pinchart wrote:
> > >>> Hi Sakari,
> > >>> 
> > >>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> > >>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab 
wrote:
> > >>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> > >>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab 
wrote:
> > >>>>>>> Hi Sakari,
> > >>> 
> > >>> There's plenty of way to try and work around the problem in drivers,
> > >>> some more racy than others, but if we require changes to all platform
> > >>> drivers to fix this we need to ensure that we get it right, not as
> > >>> half-baked hacks spread around the whole subsystem.
> > >> 
> > >> Why on earth do we want this for the omap3 driver? It is not a
> > >> hot-pluggable device and I see no reason whatsoever to start modifying
> > >> platform drivers just because you can do an unbind. I know there are
> > >> real hot-pluggable devices, and getting this right for those is of
> > >> course important.
> > > 
> > > That's indeed a very good point. If unbind is not needed by any usecase,
> > > the better fix for OMAP3 would be to just prevent it to happen in the
> > > first
> > > place.
> > > 
> > >>>>> The USB subsystem has a a .disconnect() callback that notifies
> > >>>>> the drivers that a device was unbound (likely physically removed).
> > >>>>> The way USB media drivers handle it is by returning -ENODEV to any
> > >>>>> V4L2 call that would try to touch at the hardware after unbound.
> > >> 
> > >> In my view the main problem is that the media core is bound to a struct
> > >> device set by the driver that creates the MC. But since the MC gives an
> > >> overview of lots of other (sub)devices the refcount of the media device
> > >> should be increased for any (sub)device that adds itself to the MC and
> > >> decreased for any (sub)device that is removed. Only when the very last
> > >> user goes away can the MC memory be released.
> > >> 
> > >> The memory/refcounting associated with device nodes is unrelated to
> > >> this:
> > >> once a devnode is unregistered it will be removed in /dev, and once the
> > >> last open fh closes any memory associated with the devnode can be
> > >> released.
> > >> That will also decrease the refcount to its parent device.
> > >> 
> > >> This also means that it is a bad idea to embed devnodes in a larger
> > >> struct.
> > >> They should be allocated and freed when the devnode is unregistered and
> > >> the last open filehandle is closed.
> > >> 
> > >> Then the parent's device refcount is decreased, and that may now call
> > >> its
> > >> release callback if the refcount reaches 0.
> > >> 
> > >> For the media controller's device: any other device driver that needs
> > >> access to it needs to increase that device's refcount, and only when
> > >> those devices are released will they decrease the MC device's
> > >> refcount.
> > >> 
> > >> And when that refcount goes to 0 can we finally free everything.
> > >> 
> > >> With regards to the opposition to reverting those initial patches, I'm
> > >> siding with Greg KH. Just revert the bloody patches. It worked most of
> > >> the
> > >> time before those patches, so reverting really won't cause bisect
> > >> problems.
> > > 
> > > You're contradicting yourself here ;)
> > > 
> > > The patches that this patch series is reverting are the ones that
> > > de-embeeds devnode struct and fixes its lifecycle.
> > > 
> > > Reverting those patches will cause regressions on hot-pluggable drivers,
> > > preventing them to be unplugged. So, if we're willing to revert, then we
> > > should also revert MC support on them.
> > 
> > Two options:
> > 
> > 1) Revert, then build up a proper solution.
> 
> Reverting is a regression, as we'll strip off the MC support from the
> existing devices. We would also need to revert a lot more than just those
> 3 patches.

It's not a regression for all the drivers that were already broken before. It 
can be considered as a regression for the drivers that have been broken 
afterwards (as far as I understand that's several USB drivers that you and 
Shuah have migrated to MC in the past few months, but I haven't followed 
driver work closely enough to name them), and I would certainly not oppose to 
additional patches being reverted for those drivers.

> > 2) Do a big-bang patch switching directly over to the new solution, but
> > that's very hard to review.
> > 2a) Post the patch series in small chunks on the mailinglist (starting
> > with the reverts), but once we're all happy merge that patch series into
> > a single big-bang patch and apply that.
> 
> We could do that, but so far, what has been submitted are incomplete,
> as they only touch on a single driver (with doesn't require hot-plugging),
> breaking all the other ones.
> 
> > As far as I am concerned the whole hotplugging code is broken and has been
> > for a very long time. We (or at least I :-) ) understand the underlying
> > concepts a lot better, so we can do a better job. But the transition may
> > well be painful.
> 
> It is not broken currently on the devices that require hotplugging.

It is. The problem is in the core as Sakari as proven multiple times. The race 
window might be smaller than it used to be, but the base on top of which our 
drivers are built have degraded in a pretty terrible way over the last year. 
We need to fix it before it collapses, which requires solving the problem 
correctly before building anything else on top of it.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 14:56                           ` Shuah Khan
@ 2016-12-16 16:58                             ` Laurent Pinchart
  0 siblings, 0 replies; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-16 16:58 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Mauro Carvalho Chehab, Sakari Ailus, Sakari Ailus, linux-media, hverkuil

Hi Shuah,

On Thursday 15 Dec 2016 07:56:55 Shuah Khan wrote:
> On 12/15/2016 03:39 AM, Laurent Pinchart wrote:
> > On Tuesday 13 Dec 2016 15:23:53 Shuah Khan wrote:

[snip]

> >> Please don't pursue this RFC series that makes mc-core changes until
> >> ompa3 driver problems are addressed. There is no need to change the
> >> core unless it is necessary.
> > 
> > It is necessary as has been explained countless times, and will become
> > more and more necessary as media_device instances get shared between
> > multiple drivers, which is currently attempted *without* reference
> > counting.
> 
> You are probably forgetting the Media Device Allocator API work I did
> to make media_device sharable across media and audio drivers.

I haven't. How could I forget it ? :-) Media device sharing is important, and 
will become even more so in the future.

> Sakari's patches don't address the sharable need.

That's correct.

> I have been asking Sakari to use Media Device Allocator API in his patch
> series for allocating media device.

That's where I disagree. The more we dig the more we realize that the current 
infrastructure is broken. Adding anything on top of a construction that is on 
the verge of collapsing isn't a good idea until the foundations have been 
fixed and consolidated.

> I discussed the conflicts between the work I am doing and Sakari's series
> to find a common ground. But it doesn't look like we are going to get there.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Media summit in Feb? - Was: Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-16 14:45                                                   ` Hans Verkuil
@ 2016-12-19  9:28                                                     ` Mauro Carvalho Chehab
  2016-12-21  1:31                                                       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-19  9:28 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Shuah Khan, Laurent Pinchart, Sakari Ailus, Sakari Ailus, linux-media

Em Fri, 16 Dec 2016 15:45:10 +0100
Hans Verkuil <hverkuil@xs4all.nl> escreveu:

> We really need a whiteboard for this :-(

Well, we could schedule a media summit together with ELC NA.

ELC will be in Feb, 21-23 in Portland.

Comments?
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-16 15:07                             ` Sakari Ailus
  2016-12-16 16:34                               ` Laurent Pinchart
@ 2016-12-19  9:46                               ` Mauro Carvalho Chehab
  2017-01-02  7:53                                 ` Sakari Ailus
  1 sibling, 1 reply; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-19  9:46 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Hans Verkuil, Laurent Pinchart, Shuah Khan, Sakari Ailus,
	linux-media, Helen Koike

Em Fri, 16 Dec 2016 17:07:23 +0200
Sakari Ailus <sakari.ailus@iki.fi> escreveu:

> Hi Hans,

> > chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
> > on release(). Thus ensuring that the cdev can never be removed while in an
> > ioctl.  
> 
> It does, but it does not affect memory which is allocated separately of that.
> 
> See this:
> 
> <URL:https://www.mail-archive.com/linux-media@vger.kernel.org/msg106390.html>

That sounds promising. If this bug issues other drivers than OMAP3,
then indeed the core has a bug.

I'll see if I can reproduce it here with some USB drivers later this week.

> > If the omap3 is used as a testbed, then that's fine by me, but even then I
> > probably wouldn't want the omap3 code that makes this possible in the kernel.
> > It's just additional code for no purpose.  
> 
> The same problems exist on other devices, whether platform, pci or USB, as
> the problems are in the core frameworks rather than (only) in the drivers.
> 
> On platform devices, this happens also when removing the module.
> 
> I've used omap3isp as an example since it demonstrates well the problems and
> a lot of people have the hardware as well. Also, Mauro has requested all
> drivers to be converted to the new API. I'm fine doing that for the actually
> hot-pluggable hardware.

While IMHO it is overkill trying to support hot plug on omap3, I won't
mind if you do that, provided that your patch series can be applied in
a way that it won't cause regressions for real hot-pluggable hardware.

I still think that keeping cdev embedded in a structure that it is
created dynamically when registering the device node, instead of
embedding it at struct media_device. Yet, if you prove that this does
more harm than good, I'm ok on re-embeeding it. However, on such case,
you need to put the patches re-embeeding it at the end of the patch
series (and not at the beginning), as otherwise you'll be causing
regressions.

> One additional reason is that as the omap3isp driver has been used as an
> example to write a number of other drivers, people do see what's the right
> way to do these things, instead of copying code from a driver doing it
> wrong.

Interesting argument. Yet, IMHO, the best would be to do the proper
review on the first platform driver that would support hot-plug,
and use this as an example. It is a shame that project Aurora was
discontinued, as media drivers for such kind of hardware would be an
interesting example.

On that matter, just like we use vivid as a testbench and as an
example for other drivers, it would be great if we could merge
the vimc driver. What's the status of Helen's patchset?

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Media summit in Feb? - Was: Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-19  9:28                                                     ` Media summit in Feb? - Was: " Mauro Carvalho Chehab
@ 2016-12-21  1:31                                                       ` Mauro Carvalho Chehab
  2016-12-21 14:27                                                         ` Shuah Khan
  2016-12-22 17:47                                                         ` Laurent Pinchart
  0 siblings, 2 replies; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-21  1:31 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Shuah Khan, Laurent Pinchart, Sakari Ailus, Sakari Ailus, linux-media

Em Mon, 19 Dec 2016 07:28:29 -0200
Mauro Carvalho Chehab <mchehab@s-opensource.com> escreveu:

> Em Fri, 16 Dec 2016 15:45:10 +0100
> Hans Verkuil <hverkuil@xs4all.nl> escreveu:
> 
> > We really need a whiteboard for this :-(
> 
> Well, we could schedule a media summit together with ELC NA.
> 
> ELC will be in Feb, 21-23 in Portland.

Btw, I'm pre reserving a room for us in Feb, 20, assuming that
people can make it.


Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Media summit in Feb? - Was: Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-21  1:31                                                       ` Mauro Carvalho Chehab
@ 2016-12-21 14:27                                                         ` Shuah Khan
  2016-12-22 17:47                                                         ` Laurent Pinchart
  1 sibling, 0 replies; 89+ messages in thread
From: Shuah Khan @ 2016-12-21 14:27 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Hans Verkuil, Shuah Khan, Laurent Pinchart, Sakari Ailus,
	Sakari Ailus, linux-media, Shuah Khan

On Tue, Dec 20, 2016 at 6:31 PM, Mauro Carvalho Chehab
<mchehab@s-opensource.com> wrote:
> Em Mon, 19 Dec 2016 07:28:29 -0200
> Mauro Carvalho Chehab <mchehab@s-opensource.com> escreveu:
>
>> Em Fri, 16 Dec 2016 15:45:10 +0100
>> Hans Verkuil <hverkuil@xs4all.nl> escreveu:
>>
>> > We really need a whiteboard for this :-(
>>
>> Well, we could schedule a media summit together with ELC NA.
>>
>> ELC will be in Feb, 21-23 in Portland.
>
> Btw, I'm pre reserving a room for us in Feb, 20, assuming that
> people can make it.
>

Hi Mauro,

I can make it.
>
> Thanks,
> Mauro

thanks,
-- Shuah
> --
> To unsubscribe from this list: send the line "unsubscribe linux-media" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Media summit in Feb? - Was: Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-21  1:31                                                       ` Mauro Carvalho Chehab
  2016-12-21 14:27                                                         ` Shuah Khan
@ 2016-12-22 17:47                                                         ` Laurent Pinchart
  2016-12-22 20:43                                                           ` Mauro Carvalho Chehab
  1 sibling, 1 reply; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-22 17:47 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Hans Verkuil, Shuah Khan, Sakari Ailus, Sakari Ailus, linux-media

Hi Mauro,

On Tuesday 20 Dec 2016 23:31:42 Mauro Carvalho Chehab wrote:
> Em Mon, 19 Dec 2016 07:28:29 -0200 Mauro Carvalho Chehab escreveu:
> > Em Fri, 16 Dec 2016 15:45:10 +0100 Hans Verkuil escreveu:
> >> We really need a whiteboard for this :-(
> > 
> > Well, we could schedule a media summit together with ELC NA.
> > 
> > ELC will be in Feb, 21-23 in Portland.
> 
> Btw, I'm pre reserving a room for us in Feb, 20, assuming that
> people can make it.

I'll be in Portland from the 18th to the 25th. I'm not sure yet whether I'll 
be free on the 20th though as I also have other meetings to schedule. I'll try 
to find out soon.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: Media summit in Feb? - Was: Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-22 17:47                                                         ` Laurent Pinchart
@ 2016-12-22 20:43                                                           ` Mauro Carvalho Chehab
  0 siblings, 0 replies; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2016-12-22 20:43 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Hans Verkuil, Shuah Khan, Sakari Ailus, Sakari Ailus, linux-media

Em Thu, 22 Dec 2016 19:47:15 +0200
Laurent Pinchart <laurent.pinchart@ideasonboard.com> escreveu:

> Hi Mauro,
> 
> On Tuesday 20 Dec 2016 23:31:42 Mauro Carvalho Chehab wrote:
> > Em Mon, 19 Dec 2016 07:28:29 -0200 Mauro Carvalho Chehab escreveu:  
> > > Em Fri, 16 Dec 2016 15:45:10 +0100 Hans Verkuil escreveu:  
> > >> We really need a whiteboard for this :-(  
> > > 
> > > Well, we could schedule a media summit together with ELC NA.
> > > 
> > > ELC will be in Feb, 21-23 in Portland.  
> > 
> > Btw, I'm pre reserving a room for us in Feb, 20, assuming that
> > people can make it.  
> 
> I'll be in Portland from the 18th to the 25th. I'm not sure yet whether I'll 
> be free on the 20th though as I also have other meetings to schedule. I'll try 
> to find out soon.

Unfortunately, Feb, 20th seems to be the only day outside ELC
that LF can provide us a room. We could try to do something
between Feb, 21-23, but I guess this would be harder for us, as
people may have different arrangements during ELC days.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 14:45                             ` Shuah Khan
  2016-12-15 15:26                               ` Hans Verkuil
@ 2016-12-23 17:27                               ` Laurent Pinchart
  1 sibling, 0 replies; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-23 17:27 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Hans Verkuil, Sakari Ailus, Mauro Carvalho Chehab, Sakari Ailus,
	linux-media

Hi Shuah,

On Thursday 15 Dec 2016 07:45:29 Shuah Khan wrote:
> On 12/15/2016 07:03 AM, Hans Verkuil wrote:
> > On 15/12/16 13:56, Laurent Pinchart wrote:
> >> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> >>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
> >>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> >>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab wrote:

[snip]

> >> There's plenty of way to try and work around the problem in drivers, some
> >> more racy than others, but if we require changes to all platform drivers
> >> to fix this we need to ensure that we get it right, not as half-baked
> >> hacks spread around the whole subsystem.
> > 
> > Why on earth do we want this for the omap3 driver? It is not a
> > hot-pluggable device and I see no reason whatsoever to start modifying
> > platform drivers just because you can do an unbind. I know there are real
> > hot-pluggable devices, and getting this right for those is of course
> > important.
> 
> This was my first reaction when I saw this RFC series. None of the platform
> drivers are designed to be unbound. Making core changes based on such as
> driver would make the core very complex.
>
> We can't even reproduce the problem on other drivers.
> 
> > If the omap3 is used as a testbed, then that's fine by me, but even then I
> > probably wouldn't want the omap3 code that makes this possible in the
> > kernel. It's just additional code for no purpose.
> 
> I agree with Hans. Why are we using the most complex case as a reference
> driver


The omap3isp driver is a very good test case, as it registers a media 
controller device node, multiple video device nodes and multiple subdev device 
nodes. This is not an exceptional situation (and is actually simpler than a 
driver that would also register an audio device, as we would span multiple 
subsystems there). If we can't design a clean lifetime management solution for 
MC and V4L2 objects that fixes the unbind problem with omap3isp then we could 
as well give up on kernel development completely.

> and basing that driver to make core changes which will force changes to all
> the driver that use mc-core?

Making changes to all drivers isn't a goal. My goal is to fix the objects 
lifetime management problem cleanly. If there's a way to do so that minimizes 
changes to drivers, great. Otherwise, we'll have to bite the bullet. The MC 
and V4L2 core code is the foundation on top of which everything is built, it 
has to be fail-proof and clean.

> >>>> That could be done by overwriting the dev.release() callback at
> >>>> omap3 driver, as I discussed on my past e-mails, and flagging the
> >>>> driver that it should not accept streamon anymore, as the hardware
> >>>> is being disconnecting.
> >>> 
> >>> A mutex will be needed to serialise the this with starting streaming.
> >>> 
> >>>> Btw, that explains a lot why Shuah can't reproduce the stuff you're
> >>>> complaining on her USB hardware.
> >>>> 
> >>>> The USB subsystem has a a .disconnect() callback that notifies
> >>>> the drivers that a device was unbound (likely physically removed).
> >>>> The way USB media drivers handle it is by returning -ENODEV to any
> >>>> V4L2 call that would try to touch at the hardware after unbound.
> > 
> > In my view the main problem is that the media core is bound to a struct
> > device set by the driver that creates the MC. But since the MC gives an
> > overview of lots of other (sub)devices the refcount of the media device
> > should be increased for any (sub)device that adds itself to the MC and
> > decreased for any (sub)device that is removed. Only when the very last
> > user goes away can the MC memory be released.
> 
> Correct. Media Device Allocator API work I did allows creating media device
> on parent USB device to allow media sound driver share the media device. It
> does ref-counting on media device and media device is unregistered only when
> the last driver unregisters it.

It doesn't address references taken to the media_device from v4l2_subdev and 
video_device though. I believe we need one reference counting implementation 
that can cover both references from other objects and media_device sharing.

> There is another aspect to explore regarding media device and the graph.
> 
> Should all the entities stick around until all references to media
> device are gone? If an application has /dev/media open, does that
> mean all entities should not be free'd until this app. exits?

Probably not, as we need to target dynamic updates of the media graph.

> What should happen if an app. is streaming? Should the graph stay intact
> until the app. exits?

I'll need to give this more thought, but I'd say yes.

>    If yes, this would pose problems when we have multiple drivers bound
>    to the media device. When audio driver goes away for example, it should
>    be allowed to delete its entities.

There's two parts to "driver goes away". When the device is unbound from the 
driver, entities should probably not be deleted immediately but should instead 
be reference-counted. Module removal should be blocked by taking a reference 
to the module until all related entities have been freed.

> The approach current mc-core takes is that the media_device and
> media_devnode stick around, but entities can be added and removed during
> media_device lifetime.

Adding and removing entities during the lifetime of media_device is needed, 
but it doesn't mean that removal should release the entity synchronously.

> If an app. is still running when media_device is unregistered, media_device
> isn't released until the last reference goes away and ioctls can check if
> media_device is registered or not.
> 
> We have to decide on the larger lifetime question surrounding media_device
> and graph as well.
> 
> > The memory/refcounting associated with device nodes is unrelated to this:
> > once a devnode is unregistered it will be removed in /dev, and once the
> > last open fh closes any memory associated with the devnode can be
> > released. That will also decrease the refcount to its parent device.
> > 
> > This also means that it is a bad idea to embed devnodes in a larger
> > struct. They should be allocated and freed when the devnode is
> > unregistered and the last open filehandle is closed.
> > 
> > Then the parent's device refcount is decreased, and that may now call its
> > release callback if the refcount reaches 0.
> > 
> > For the media controller's device: any other device driver that needs
> > access to it needs to increase that device's refcount, and only when
> > those devices are released will they decrease the MC device's refcount.
> > 
> > And when that refcount goes to 0 can we finally free everything.
> > 
> > With regards to the opposition to reverting those initial patches, I'm
> > siding with Greg KH. Just revert the bloody patches. It worked most of the
> > time before those patches, so reverting really won't cause bisect
> > problems.
> > 
> > Just revert and build up things as they should.
> > 
> > Note that v4l2-dev.c doesn't do things correctly (it doesn't set the cdev
> > parent pointer for example) and many drivers (including omap3isp) embed
> > video_device, which is wrong and can lead to complications.
> > 
> > I'm to blame for the embedding since I thought that was a good idea at one
> > time. I now realized that it isn't. Sorry about that...
> > 
> > And because the cdev of the video_device doesn't know about the parent
> > device it is (I think) possible that the parent device is released before
> > the cdev is released. Which can result in major problems.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 15:26                               ` Hans Verkuil
  2016-12-15 16:06                                 ` Shuah Khan
  2016-12-15 17:08                                 ` Mauro Carvalho Chehab
@ 2016-12-23 17:48                                 ` Laurent Pinchart
  2 siblings, 0 replies; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-23 17:48 UTC (permalink / raw)
  To: Hans Verkuil
  Cc: Shuah Khan, Sakari Ailus, Mauro Carvalho Chehab, Sakari Ailus,
	linux-media

Hi Hans,

On Thursday 15 Dec 2016 16:26:19 Hans Verkuil wrote:
> On 15/12/16 15:45, Shuah Khan wrote:
> > On 12/15/2016 07:03 AM, Hans Verkuil wrote:

[snip]

> >> In my view the main problem is that the media core is bound to a struct
> >> device set by the driver that creates the MC. But since the MC gives an
> >> overview of lots of other (sub)devices the refcount of the media device
> >> should be increased for any (sub)device that adds itself to the MC and
> >> decreased for any (sub)device that is removed. Only when the very last
> >> user goes away can the MC memory be released.
> > 
> > Correct. Media Device Allocator API work I did allows creating media
> > device on parent USB device to allow media sound driver share the media
> > device. It does ref-counting on media device and media device is
> > unregistered only when the last driver unregisters it.
> > 
> > There is another aspect to explore regarding media device and the graph.
> > 
> > Should all the entities stick around until all references to media
> > device are gone? If an application has /dev/media open, does that
> > mean all entities should not be free'd until this app. exits? What
> > should happen if an app. is streaming? Should the graph stay intact
> > until the app. exits?
> 
> Yes, everything must stay around until the last user has disappeared.
> 
> In general unplugs can happen at any time. So applications can be in the
> middle of an ioctl, and removing memory during that time is just
> impossible.
> 
> On unplug you:
> 
> 1) stop any HW DMA (highly device dependent)
> 2) wake up any filehandles that wait for an event
> 3) unregister any device nodes

Shouldn't 2 and 3 be switched ? We also need to return all buffers to vb2 
without any race condition, so I would say the sequence of events should be as 
follows.

1. Mark the device as being disconnected. This condition should be tested by 
the .buf_queue() handler that will then return the buffer immediately to vb2 
with the state set to VB2_BUF_STATE_ERROR.
2. Stop hardware operation (DMA, interrupts, ...).
3. Unregister the devnodes. This shall result in all new ioctl calls being 
blocked by the core.
4. Wake up all waiters.

There's still a race between 2 and 3, as new hardware operations could be 
started. We need to decide how to handle that.

The uvcvideo driver handles this in a reasonably clean way (at least for the 
video devnodes, there are races related to the media controller devnode), but 
the driver-side implementation is a bit complex (look at the comment in 
uvc_queue_cancel(), and how uvc_unregister_video() has to increase the 
refcount temporarily for instance) even if the fact that the USB core stops 
hardware access simplifies step 2 above. It would be nice if we could move 
some of the code to the core.

> Then just sit back and wait for refcounts to go down as filehandles are
> closed by the application.

Sit back doesn't mean that the unbind handler (.remove for platform devices, 
.disconnect for USB devices, ...) blocks, right ? It should return after 
completing the steps above, 

> Note: the v4l2/media/cec/IR/whatever core is typically responsible for
> rejecting any ioctls/mmap/etc. once the device node has been unregistered.
> The only valid file operation is release().

That's a very good start. The hard part is then the handling of ioctls in 
progress.

> >    If yes, this would pose problems when we have multiple drivers bound
> >    to the media device. When audio driver goes away for example, it should
> >    be allowed to delete its entities.
> 
> Only if you can safely remove it from the topology data structures while
> being 100% certain that nobody can ever access it. I'm not sure if that is
> the case.

In some cases it might be, but I don't think we can build anything on top of 
that assumption.

> Actually, looking at e.g. adv7604.c it does
> media_entity_cleanup(&sd->entity); in remove() which is an empty function,
> so there doesn't appear any attempt to safely clean up an entity (i.e. make
> sure no running media ioctl can access it or call ops).
> 
> This probably will need to be serialized with the graph_mutex lock.

In the worst case, but we should try to minimize lock contention with proper 
refcounting.

> > The approach current mc-core takes is that the media_device and
> > media_devnode stick around, but entities can be added and removed during
> > media_device lifetime.
> 
> Seems reasonable. But the removal needs to be done carefully, and that
> doesn't seem to be the case now (unless adv7604.c is just buggy).
> 
> > If an app. is still running when media_device is unregistered,
> > media_device isn't released until the last reference goes away and ioctls
> > can check if media_device is registered or not.
> > 
> > We have to decide on the larger lifetime question surrounding media_device
> > and graph as well.
> 
> I don't think there is any choice but to keep it all alive until the last
> reference goes away.

I agree with this as a general principle. Now we'll have to analyse the 
problem in details and see where references can be borrowed, which could 
simplify the implementation. For instance I don't think we'll need to refcount 
pad objects.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 17:08                                 ` Mauro Carvalho Chehab
@ 2016-12-23 17:55                                   ` Laurent Pinchart
  0 siblings, 0 replies; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-23 17:55 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Hans Verkuil, Shuah Khan, Sakari Ailus, Sakari Ailus, linux-media

Hi Mauro,

On Thursday 15 Dec 2016 15:08:26 Mauro Carvalho Chehab wrote:
> Em Thu, 15 Dec 2016 16:26:19 +0100 Hans Verkuil escreveu:
> >> Should all the entities stick around until all references to media
> >> device are gone? If an application has /dev/media open, does that
> >> mean all entities should not be free'd until this app. exits? What
> >> should happen if an app. is streaming? Should the graph stay intact
> >> until the app. exits?
> > 
> > Yes, everything must stay around until the last user has disappeared.
> > 
> > In general unplugs can happen at any time. So applications can be in the
> > middle of an ioctl, and removing memory during that time is just
> > impossible.
> > 
> > On unplug you:
> > 
> > 1) stop any HW DMA (highly device dependent)
> > 2) wake up any filehandles that wait for an event
> > 3) unregister any device nodes
> > 
> > Then just sit back and wait for refcounts to go down as filehandles are
> > closed by the application.
> > 
> > Note: the v4l2/media/cec/IR/whatever core is typically responsible for
> > rejecting any ioctls/mmap/etc. once the device node has been
> > unregistered. The only valid file operation is release().
> 
> Agreed. The problem on OMAP3 is that it doesn't stop HW DMA when
> struct media_devnode is released. It tries to do it later, when the
> V4L2 core is unbind, by trying to dig into the media controller
> struct that the driver removed before.

Note that stopping the hardware doesn't mean updating the pipeline state to 
mark it as stopped. Unlike stopping the hardware that is mandatory at unbind 
time as hardware access is not allowed after the unbind handler returns, how 
we handle the software state is entirely up to us. I'm not saying it can't be 
done at unbind time, but I'm not sure yet whether it should either.

> That's said, for OMAP3 and all other drivers that don't support hot unplug,
> I would just use suppress_bind_attrs, as I fail to see any need to allow
> unbinding them.

That's akin to breaking the thermometer to cure the patient from fever. I'm 
not completely opposed to making drivers non-unbindable in a case-by-case 
basis (and based on the author's will), but if a driver author wants to make a 
driver unbindable the core should allow that to be implemented.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-15 16:06                                 ` Shuah Khan
  2016-12-15 16:28                                   ` Hans Verkuil
@ 2016-12-23 18:13                                   ` Laurent Pinchart
  1 sibling, 0 replies; 89+ messages in thread
From: Laurent Pinchart @ 2016-12-23 18:13 UTC (permalink / raw)
  To: Shuah Khan
  Cc: Hans Verkuil, Sakari Ailus, Mauro Carvalho Chehab, Sakari Ailus,
	linux-media

Hi Shuah,

On Thursday 15 Dec 2016 09:06:41 Shuah Khan wrote:
> On 12/15/2016 08:26 AM, Hans Verkuil wrote:
> > On 15/12/16 15:45, Shuah Khan wrote:
> >> On 12/15/2016 07:03 AM, Hans Verkuil wrote:
> >>> On 15/12/16 13:56, Laurent Pinchart wrote:
> >>>> On Thursday 15 Dec 2016 13:30:41 Sakari Ailus wrote:
> >>>>> On Tue, Dec 13, 2016 at 10:24:47AM -0200, Mauro Carvalho Chehab wrote:
> >>>>>> Em Tue, 13 Dec 2016 12:53:05 +0200 Sakari Ailus escreveu:
> >>>>>>> On Tue, Nov 29, 2016 at 09:13:05AM -0200, Mauro Carvalho Chehab 
wrote:

[snip]

> >>> In my view the main problem is that the media core is bound to a struct
> >>> device set by the driver that creates the MC. But since the MC gives an
> >>> overview of lots of other (sub)devices the refcount of the media device
> >>> should be increased for any (sub)device that adds itself to the MC and
> >>> decreased for any (sub)device that is removed. Only when the very last
> >>> user goes away can the MC memory be released.
> >> 
> >> Correct. Media Device Allocator API work I did allows creating media
> >> device on parent USB device to allow media sound driver share the media
> >> device. It does ref-counting on media device and media device is
> >> unregistered only when the last driver unregisters it.
> >> 
> >> There is another aspect to explore regarding media device and the graph.
> >> 
> >> Should all the entities stick around until all references to media
> >> device are gone? If an application has /dev/media open, does that
> >> mean all entities should not be free'd until this app. exits? What
> >> should happen if an app. is streaming? Should the graph stay intact
> >> until the app. exits?
> > 
> > Yes, everything must stay around until the last user has disappeared.
> > 
> > In general unplugs can happen at any time. So applications can be in the
> > middle of an ioctl, and removing memory during that time is just
> > impossible.
> > 
> > On unplug you:
> > 
> > 1) stop any HW DMA (highly device dependent)
> > 2) wake up any filehandles that wait for an event
> > 3) unregister any device nodes
> > 
> > Then just sit back and wait for refcounts to go down as filehandles are
> > closed by the application.
> > 
> > Note: the v4l2/media/cec/IR/whatever core is typically responsible for
> > rejecting any ioctls/mmap/etc. once the device node has been
> > unregistered. The only valid file operation is release().
> > 
> >>    If yes, this would pose problems when we have multiple drivers bound
> >>    to the media device. When audio driver goes away for example, it
> >>    should
> >>    be allowed to delete its entities.
> > 
> > Only if you can safely remove it from the topology data structures while
> > being 100% certain that nobody can ever access it. I'm not sure if that is
> > the case. Actually, looking at e.g. adv7604.c it does
> > media_entity_cleanup(&sd->entity); in remove() which is an empty
> > function, so there doesn't appear any attempt to safely clean up an
> > entity (i.e. make sure no running media ioctl can access it or call ops).
> 
> Right. media_entity_cleanup() nothing at the moment. Also if it gets called
> after media_device_unregister_entity(), it could pose problems. I wonder if
> we have drivers that are calling media_entity_cleanup() after unregistering
> the entity?
> 
> > This probably will need to be serialized with the graph_mutex lock.
> > 
> >> The approach current mc-core takes is that the media_device and
> >> media_devnode stick around, but entities can be added and removed during
> >> media_device lifetime.
> > 
> > Seems reasonable. But the removal needs to be done carefully, and that
> > doesn't seem to be the case now (unless adv7604.c is just buggy).
> 
> Correct. It is possible media_device is embedded in this driver.

I assume you mean the private data structure instantiated by the adv7604 
driver. That can't be the case, as adv7604 is a subdev.

> When driver that embeds is unbound, media_device goes away. I needed to make
> the media device refcounted and sharable for audio work and that is what the
> Media Device Allocator API does.
> 
> Maybe we have more cases than this audio case that requires media_device
> refcounted. If we have to keep entities that are in use until all the
> references go away, we have to ref-count them as well.

I think we're converging towards refcounting media_device to manage its 
lifetime, so there's more cases, yes. That's why I propose first making 
media_device refcounted, and then adding the allocator API on top of that 
given that the allocator API requires refcounting. That seems to me to be the 
cleanest approach.

> >> If an app. is still running when media_device is unregistered,
> >> media_device isn't released until the last reference goes away and ioctls
> >> can check if media_device is registered or not.
> >> 
> >> We have to decide on the larger lifetime question surrounding
> >> media_device and graph as well.
> > 
> > I don't think there is any choice but to keep it all alive until the last
> > reference goes away.
> 
> If you mean "all alive" entities as well, we have to ref-count them. Because
> drivers can unregister entities during run-time now. I am looking at the
> use-case where, a driver that has dvb and video and what should happen when
> dvb is unbound for example. Should dvb entities go away or should they stay
> until all the drivers are unbound?

I believe they'll have to stay until they're not referenced anymore, which 
could (and likely should) be earlier than the the other drivers are unbound.

> v4l2-core registers and unregisters entities and so does dvb-core. So when a
> driver unregisters video and dvb, these entities get deleted.

Not if we refcount entities, deletion won't be synchronous with unregistration 
in that case.

> So we have a distributed mode of registering and unregistering entities. We
> also have ioctls (video, dvb, and media) accessing these entities. So where
> do we make changes to ensure entities stick around until all users exit?
> 
> Ref-counting entities won't work if they are embedded - like in the case of
> struct video_device which embeds the media entity. When struct video goes
> away then entity will disappear.

Why is that ? If the entity is referenced it should certainly prevent 
video_device from disappearing.

> So we do have a complex lifetime model here that we need to figure out how
> to fix.

-- 
Regards,

Laurent Pinchart


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2016-12-19  9:46                               ` Mauro Carvalho Chehab
@ 2017-01-02  7:53                                 ` Sakari Ailus
  2017-01-24 10:49                                   ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2017-01-02  7:53 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Hans Verkuil, Laurent Pinchart, Shuah Khan, Sakari Ailus,
	linux-media, Helen Koike

Hi Mauro,

On Mon, Dec 19, 2016 at 07:46:55AM -0200, Mauro Carvalho Chehab wrote:
> Em Fri, 16 Dec 2016 17:07:23 +0200
> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> 
> > Hi Hans,
> 
> > > chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
> > > on release(). Thus ensuring that the cdev can never be removed while in an
> > > ioctl.  
> > 
> > It does, but it does not affect memory which is allocated separately of that.
> > 
> > See this:
> > 
> > <URL:https://www.mail-archive.com/linux-media@vger.kernel.org/msg106390.html>
> 
> That sounds promising. If this bug issues other drivers than OMAP3,
> then indeed the core has a bug.
> 
> I'll see if I can reproduce it here with some USB drivers later this week.

It's not a driver problem so yes, it is reproducible on other hardware.

> 
> > > If the omap3 is used as a testbed, then that's fine by me, but even then I
> > > probably wouldn't want the omap3 code that makes this possible in the kernel.
> > > It's just additional code for no purpose.  
> > 
> > The same problems exist on other devices, whether platform, pci or USB, as
> > the problems are in the core frameworks rather than (only) in the drivers.
> > 
> > On platform devices, this happens also when removing the module.
> > 
> > I've used omap3isp as an example since it demonstrates well the problems and
> > a lot of people have the hardware as well. Also, Mauro has requested all
> > drivers to be converted to the new API. I'm fine doing that for the actually
> > hot-pluggable hardware.
> 
> While IMHO it is overkill trying to support hot plug on omap3, I won't
> mind if you do that, provided that your patch series can be applied in
> a way that it won't cause regressions for real hot-pluggable hardware.

This is not really about the OMAP3 ISP driver hotplug support; it is indeed
about the framework's ability to support hotpluggable hardware. The current
painpoint is removing hardware; the current frameworks aren't quite up to
that at the moment.

I haven't checked how many plain V4L2 drivers do this correctly but the
problem domain becomes a lot more complex when you add V4L2 sub-device and
Media controller nodes. Having a driver that does implement this correctly
is important for writing new drivers, hence the changes to the OMAP3 ISP
driver.

> 
> I still think that keeping cdev embedded in a structure that it is
> created dynamically when registering the device node, instead of
> embedding it at struct media_device. Yet, if you prove that this does
> more harm than good, I'm ok on re-embeeding it. However, on such case,
> you need to put the patches re-embeeding it at the end of the patch
> series (and not at the beginning), as otherwise you'll be causing
> regressions.
> 
> > One additional reason is that as the omap3isp driver has been used as an
> > example to write a number of other drivers, people do see what's the right
> > way to do these things, instead of copying code from a driver doing it
> > wrong.
> 
> Interesting argument. Yet, IMHO, the best would be to do the proper
> review on the first platform driver that would support hot-plug,
> and use this as an example. It is a shame that project Aurora was
> discontinued, as media drivers for such kind of hardware would be an
> interesting example.
> 
> On that matter, just like we use vivid as a testbench and as an
> example for other drivers, it would be great if we could merge
> the vimc driver. What's the status of Helen's patchset?

That's a good point. I wasn't reviewing that driver back then when the
patches were posted, but should it go in, it should make a good example for
writing other drivers as well.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2017-01-02  7:53                                 ` Sakari Ailus
@ 2017-01-24 10:49                                   ` Mauro Carvalho Chehab
  2017-01-25 11:02                                     ` Sakari Ailus
  0 siblings, 1 reply; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2017-01-24 10:49 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Hans Verkuil, Laurent Pinchart, Shuah Khan, Sakari Ailus,
	linux-media, Helen Koike

Hi Sakari,

Just returned this week from vacations. I'm reading my long e-mail backlog,
starting from my main inbox...

Em Mon, 2 Jan 2017 09:53:49 +0200
Sakari Ailus <sakari.ailus@iki.fi> escreveu:

> Hi Mauro,
> 
> On Mon, Dec 19, 2016 at 07:46:55AM -0200, Mauro Carvalho Chehab wrote:
> > Em Fri, 16 Dec 2016 17:07:23 +0200
> > Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> >   
> > > Hi Hans,  
> >   
> > > > chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
> > > > on release(). Thus ensuring that the cdev can never be removed while in an
> > > > ioctl.    
> > > 
> > > It does, but it does not affect memory which is allocated separately of that.
> > > 
> > > See this:
> > > 
> > > <URL:https://www.mail-archive.com/linux-media@vger.kernel.org/msg106390.html>  
> > 
> > That sounds promising. If this bug issues other drivers than OMAP3,
> > then indeed the core has a bug.
> > 
> > I'll see if I can reproduce it here with some USB drivers later this week.  
> 
> It's not a driver problem so yes, it is reproducible on other hardware.

Didn't have time to test it before entering into vacations.

I guess I won't have any time this week to test those issues on
my hardware, as I suspect that my patch queue is full. Also, we're
approaching the next merge window. So, unfortunately, I won't have
much time those days to do much testing. 

Btw, Hans commented that you were planning to working on it this month.

Do you have some news with regards to the media controller bind/unbind
fixes?

> > While IMHO it is overkill trying to support hot plug on omap3, I won't
> > mind if you do that, provided that your patch series can be applied in
> > a way that it won't cause regressions for real hot-pluggable hardware.  
> 
> This is not really about the OMAP3 ISP driver hotplug support; it is indeed
> about the framework's ability to support hotpluggable hardware. The current
> painpoint is removing hardware; the current frameworks aren't quite up to
> that at the moment.

The point here is that, while it would be fun to allow unbinding OMAP3
V4L2 drivers, OMAP3 doesn't really require hotplug support. On the other
hand, on USB drivers, where unbind is a requirement, the current status
of the tree is that hotplug works. I did some massive parallel bind/unbind
loops here to double check, when we added such fixup patches. Granted, I
won't doubt that there are still some rare race conditions that I was
unable to reproduce on the time I tested. I also didn't try to hack the
Kernel to introduce extra delays to make those race conditions more
likely to happen.

Anyway, my main concern with this patch is that it breaks hotplug on devices
that really need it, while it fix support only for OMAP3 (with doesn't need).

Also, it starts with a series of patches that will cause regressions.

I won't matter changing the solution to some other approach that would
work, provided that the patches are added on an incremented way, and
won't introduce regressions to USB drivers.

> > On that matter, just like we use vivid as a testbench and as an
> > example for other drivers, it would be great if we could merge
> > the vimc driver. What's the status of Helen's patchset?  
> 
> That's a good point. I wasn't reviewing that driver back then when the
> patches were posted, but should it go in, it should make a good example for
> writing other drivers as well.
> 

I saw Laurent's comments about Helen's last patch series. From his
comments:

	"I've reviewed the whole patch but haven't had time to test it. I've also 
	 skipped the items marked as TODO or FIXME as they're obviously not ready yet 
	 :-) Overall this looks good to me, all the issues are minor."

Helen promised a new version fixing those minor issues. Perhaps we should merge
her next series upstream with such issues addressed and see how it behaves.

Thanks,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2017-01-24 10:49                                   ` Mauro Carvalho Chehab
@ 2017-01-25 11:02                                     ` Sakari Ailus
  2017-01-26  9:10                                       ` Mauro Carvalho Chehab
  0 siblings, 1 reply; 89+ messages in thread
From: Sakari Ailus @ 2017-01-25 11:02 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Hans Verkuil, Laurent Pinchart, Shuah Khan, Sakari Ailus,
	linux-media, Helen Koike

Hi Mauro,

On Tue, Jan 24, 2017 at 08:49:02AM -0200, Mauro Carvalho Chehab wrote:
> Hi Sakari,
> 
> Just returned this week from vacations. I'm reading my long e-mail backlog,
> starting from my main inbox...
> 
> Em Mon, 2 Jan 2017 09:53:49 +0200
> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> 
> > Hi Mauro,
> > 
> > On Mon, Dec 19, 2016 at 07:46:55AM -0200, Mauro Carvalho Chehab wrote:
> > > Em Fri, 16 Dec 2016 17:07:23 +0200
> > > Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> > >   
> > > > Hi Hans,  
> > >   
> > > > > chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
> > > > > on release(). Thus ensuring that the cdev can never be removed while in an
> > > > > ioctl.    
> > > > 
> > > > It does, but it does not affect memory which is allocated separately of that.
> > > > 
> > > > See this:
> > > > 
> > > > <URL:https://www.mail-archive.com/linux-media@vger.kernel.org/msg106390.html>  
> > > 
> > > That sounds promising. If this bug issues other drivers than OMAP3,
> > > then indeed the core has a bug.
> > > 
> > > I'll see if I can reproduce it here with some USB drivers later this week.  
> > 
> > It's not a driver problem so yes, it is reproducible on other hardware.
> 
> Didn't have time to test it before entering into vacations.
> 
> I guess I won't have any time this week to test those issues on
> my hardware, as I suspect that my patch queue is full. Also, we're
> approaching the next merge window. So, unfortunately, I won't have
> much time those days to do much testing. 
> 
> Btw, Hans commented that you were planning to working on it this month.
> 
> Do you have some news with regards to the media controller bind/unbind
> fixes?

I have a bunch of meeting notes to send from the Oslo meeting with Hans and
Laurent; I should have that ready by the end of the week. The RFC patchset
certainly needs changes based on that.

> 
> > > While IMHO it is overkill trying to support hot plug on omap3, I won't
> > > mind if you do that, provided that your patch series can be applied in
> > > a way that it won't cause regressions for real hot-pluggable hardware.  
> > 
> > This is not really about the OMAP3 ISP driver hotplug support; it is indeed
> > about the framework's ability to support hotpluggable hardware. The current
> > painpoint is removing hardware; the current frameworks aren't quite up to
> > that at the moment.
> 
> The point here is that, while it would be fun to allow unbinding OMAP3
> V4L2 drivers, OMAP3 doesn't really require hotplug support. On the other
> hand, on USB drivers, where unbind is a requirement, the current status
> of the tree is that hotplug works. I did some massive parallel bind/unbind
> loops here to double check, when we added such fixup patches. Granted, I
> won't doubt that there are still some rare race conditions that I was
> unable to reproduce on the time I tested. I also didn't try to hack the
> Kernel to introduce extra delays to make those race conditions more
> likely to happen.
> 
> Anyway, my main concern with this patch is that it breaks hotplug on devices
> that really need it, while it fix support only for OMAP3 (with doesn't need).

I don't disagree with you. Obviously the intent is not to break
hot-pluggable hardware, albeit the changes needed to avoid that haven't been
implemented yet. (One of the reasons it's been RFC all the time.)

> 
> Also, it starts with a series of patches that will cause regressions.
> 
> I won't matter changing the solution to some other approach that would
> work, provided that the patches are added on an incremented way, and
> won't introduce regressions to USB drivers.

It may be possible to avoid increasing the time window during which bad
things could happen before fully removing them. However the patchset is a
lot easier to work with without bundling the reverts into other (and likely
multiple) patches as the reverted patches took quite a different direction
than is followed in this patchset.

Let's discuss this later, at the time when we have a patchset that produces
a sound code base (on the top of that patchset) that is understood to be
free of object lifetime issues as long as hot-pluggable hardware goes.

> 
> > > On that matter, just like we use vivid as a testbench and as an
> > > example for other drivers, it would be great if we could merge
> > > the vimc driver. What's the status of Helen's patchset?  
> > 
> > That's a good point. I wasn't reviewing that driver back then when the
> > patches were posted, but should it go in, it should make a good example for
> > writing other drivers as well.
> > 
> 
> I saw Laurent's comments about Helen's last patch series. From his
> comments:
> 
> 	"I've reviewed the whole patch but haven't had time to test it. I've also 
> 	 skipped the items marked as TODO or FIXME as they're obviously not ready yet 
> 	 :-) Overall this looks good to me, all the issues are minor."
> 
> Helen promised a new version fixing those minor issues. Perhaps we should merge
> her next series upstream with such issues addressed and see how it behaves.

I'll review Helen's set next.

-- 
Kind regards,

Sakari Ailus
e-mail: sakari.ailus@iki.fi	XMPP: sailus@retiisi.org.uk

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2017-01-25 11:02                                     ` Sakari Ailus
@ 2017-01-26  9:10                                       ` Mauro Carvalho Chehab
  2017-05-30 23:41                                         ` Shuah Khan
  0 siblings, 1 reply; 89+ messages in thread
From: Mauro Carvalho Chehab @ 2017-01-26  9:10 UTC (permalink / raw)
  To: Sakari Ailus
  Cc: Hans Verkuil, Laurent Pinchart, Shuah Khan, Sakari Ailus,
	linux-media, Helen Koike

Em Wed, 25 Jan 2017 13:02:31 +0200
Sakari Ailus <sakari.ailus@iki.fi> escreveu:

> Hi Mauro,
> 
> On Tue, Jan 24, 2017 at 08:49:02AM -0200, Mauro Carvalho Chehab wrote:
> > Hi Sakari,
> > 
> > Just returned this week from vacations. I'm reading my long e-mail backlog,
> > starting from my main inbox...
> > 
> > Em Mon, 2 Jan 2017 09:53:49 +0200
> > Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> >   
> > > Hi Mauro,
> > > 
> > > On Mon, Dec 19, 2016 at 07:46:55AM -0200, Mauro Carvalho Chehab wrote:  
> > > > Em Fri, 16 Dec 2016 17:07:23 +0200
> > > > Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> > > >     
> > > > > Hi Hans,    
> > > >     
> > > > > > chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
> > > > > > on release(). Thus ensuring that the cdev can never be removed while in an
> > > > > > ioctl.      
> > > > > 
> > > > > It does, but it does not affect memory which is allocated separately of that.
> > > > > 
> > > > > See this:
> > > > > 
> > > > > <URL:https://www.mail-archive.com/linux-media@vger.kernel.org/msg106390.html>    
> > > > 
> > > > That sounds promising. If this bug issues other drivers than OMAP3,
> > > > then indeed the core has a bug.
> > > > 
> > > > I'll see if I can reproduce it here with some USB drivers later this week.    
> > > 
> > > It's not a driver problem so yes, it is reproducible on other hardware.  
> > 
> > Didn't have time to test it before entering into vacations.
> > 
> > I guess I won't have any time this week to test those issues on
> > my hardware, as I suspect that my patch queue is full. Also, we're
> > approaching the next merge window. So, unfortunately, I won't have
> > much time those days to do much testing. 
> > 
> > Btw, Hans commented that you were planning to working on it this month.
> > 
> > Do you have some news with regards to the media controller bind/unbind
> > fixes?  
> 
> I have a bunch of meeting notes to send from the Oslo meeting with Hans and
> Laurent; I should have that ready by the end of the week. The RFC patchset
> certainly needs changes based on that.

OK. I'll wait for your notes and the new patchset.

> > > > While IMHO it is overkill trying to support hot plug on omap3, I won't
> > > > mind if you do that, provided that your patch series can be applied in
> > > > a way that it won't cause regressions for real hot-pluggable hardware.    
> > > 
> > > This is not really about the OMAP3 ISP driver hotplug support; it is indeed
> > > about the framework's ability to support hotpluggable hardware. The current
> > > painpoint is removing hardware; the current frameworks aren't quite up to
> > > that at the moment.  
> > 
> > The point here is that, while it would be fun to allow unbinding OMAP3
> > V4L2 drivers, OMAP3 doesn't really require hotplug support. On the other
> > hand, on USB drivers, where unbind is a requirement, the current status
> > of the tree is that hotplug works. I did some massive parallel bind/unbind
> > loops here to double check, when we added such fixup patches. Granted, I
> > won't doubt that there are still some rare race conditions that I was
> > unable to reproduce on the time I tested. I also didn't try to hack the
> > Kernel to introduce extra delays to make those race conditions more
> > likely to happen.
> > 
> > Anyway, my main concern with this patch is that it breaks hotplug on devices
> > that really need it, while it fix support only for OMAP3 (with doesn't need).  
> 
> I don't disagree with you. Obviously the intent is not to break
> hot-pluggable hardware, albeit the changes needed to avoid that haven't been
> implemented yet. (One of the reasons it's been RFC all the time.)
> 
> > 
> > Also, it starts with a series of patches that will cause regressions.
> > 
> > I won't matter changing the solution to some other approach that would
> > work, provided that the patches are added on an incremented way, and
> > won't introduce regressions to USB drivers.  
> 
> It may be possible to avoid increasing the time window during which bad
> things could happen before fully removing them.

The fix should be to protecting those windows by either a kref, lock or
a lockless (RCU) approach.

> However the patchset is a
> lot easier to work with without bundling the reverts into other (and likely
> multiple) patches as the reverted patches took quite a different direction
> than is followed in this patchset.

Doing the reverts before doing the fixes do break things. What you're
reverting is basically the logic that unbinds the struct media_devnode
from struct media_device. This is independent from whatever changes
you would be doing at struct media_device. So, you could do all changes
there, apply such changes on OMAP3 and on the USB drivers and then
rebind struct media_devnode at struct media_device[1].

[1] assuming that everyone agrees that rebinding it is for the best.
I still think that having a separate struct is better - but this is
something that I'll analise again after seeing the hole picture after
your changes - and the rationale for it.

> 
> Let's discuss this later, at the time when we have a patchset that produces
> a sound code base (on the top of that patchset) that is understood to be
> free of object lifetime issues as long as hot-pluggable hardware goes.

Let's discuss it later when you submit your newer RFC patchset on the
top of the upstream code.

> >   
> > > > On that matter, just like we use vivid as a testbench and as an
> > > > example for other drivers, it would be great if we could merge
> > > > the vimc driver. What's the status of Helen's patchset?    
> > > 
> > > That's a good point. I wasn't reviewing that driver back then when the
> > > patches were posted, but should it go in, it should make a good example for
> > > writing other drivers as well.
> > >   
> > 
> > I saw Laurent's comments about Helen's last patch series. From his
> > comments:
> > 
> > 	"I've reviewed the whole patch but haven't had time to test it. I've also 
> > 	 skipped the items marked as TODO or FIXME as they're obviously not ready yet 
> > 	 :-) Overall this looks good to me, all the issues are minor."
> > 
> > Helen promised a new version fixing those minor issues. Perhaps we should merge
> > her next series upstream with such issues addressed and see how it behaves.  
> 
> I'll review Helen's set next.

Thanks!


Regards,
Mauro

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [RFC v3 00/21] Make use of kref in media device, grab references as needed
  2017-01-26  9:10                                       ` Mauro Carvalho Chehab
@ 2017-05-30 23:41                                         ` Shuah Khan
  0 siblings, 0 replies; 89+ messages in thread
From: Shuah Khan @ 2017-05-30 23:41 UTC (permalink / raw)
  To: Mauro Carvalho Chehab, Sakari Ailus
  Cc: Hans Verkuil, Laurent Pinchart, Sakari Ailus, linux-media,
	Helen Koike, Shuah Khan, Shuah Khan

Hi Sailus/Mauro,

On 01/26/2017 02:10 AM, Mauro Carvalho Chehab wrote:
> Em Wed, 25 Jan 2017 13:02:31 +0200
> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
> 
>> Hi Mauro,
>>
>> On Tue, Jan 24, 2017 at 08:49:02AM -0200, Mauro Carvalho Chehab wrote:
>>> Hi Sakari,
>>>
>>> Just returned this week from vacations. I'm reading my long e-mail backlog,
>>> starting from my main inbox...
>>>
>>> Em Mon, 2 Jan 2017 09:53:49 +0200
>>> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
>>>   
>>>> Hi Mauro,
>>>>
>>>> On Mon, Dec 19, 2016 at 07:46:55AM -0200, Mauro Carvalho Chehab wrote:  
>>>>> Em Fri, 16 Dec 2016 17:07:23 +0200
>>>>> Sakari Ailus <sakari.ailus@iki.fi> escreveu:
>>>>>     
>>>>>> Hi Hans,    
>>>>>     
>>>>>>> chrdev_open in fs/char_dev.c increases the refcount on open() and decreases it
>>>>>>> on release(). Thus ensuring that the cdev can never be removed while in an
>>>>>>> ioctl.      
>>>>>>
>>>>>> It does, but it does not affect memory which is allocated separately of that.
>>>>>>
>>>>>> See this:
>>>>>>
>>>>>> <URL:https://www.mail-archive.com/linux-media@vger.kernel.org/msg106390.html>    
>>>>>
>>>>> That sounds promising. If this bug issues other drivers than OMAP3,
>>>>> then indeed the core has a bug.
>>>>>
>>>>> I'll see if I can reproduce it here with some USB drivers later this week.    
>>>>
>>>> It's not a driver problem so yes, it is reproducible on other hardware.  
>>>
>>> Didn't have time to test it before entering into vacations.
>>>
>>> I guess I won't have any time this week to test those issues on
>>> my hardware, as I suspect that my patch queue is full. Also, we're
>>> approaching the next merge window. So, unfortunately, I won't have
>>> much time those days to do much testing. 
>>>
>>> Btw, Hans commented that you were planning to working on it this month.
>>>
>>> Do you have some news with regards to the media controller bind/unbind
>>> fixes?  
>>
>> I have a bunch of meeting notes to send from the Oslo meeting with Hans and
>> Laurent; I should have that ready by the end of the week. The RFC patchset
>> certainly needs changes based on that.
> 
> OK. I'll wait for your notes and the new patchset.

What is the status of this patch series? Did I miss RFC v4?

As you might remember, my resource sharing work for snd-usb-audio
and the shared media object API which is necessary for media
driver and snd-usb-audio to share the media device are pending
waiting for this RFC series to go from RFC to a version that can
be merged.

I would like to get the snd-usb-audio work done soon and target it
for an upcoming release in the near future!

Could you please send an update on the status on when the next RFC
version might be sent out.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 89+ messages in thread

end of thread, other threads:[~2017-05-30 23:42 UTC | newest]

Thread overview: 89+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-26 23:43 [RFC v3 00/21] Make use of kref in media device, grab references as needed Sakari Ailus
2016-08-26 23:43 ` [RFC v3 01/21] Revert "[media] media: fix media devnode ioctl/syscall and unregister race" Sakari Ailus
2016-08-26 23:43 ` [RFC v3 02/21] Revert "[media] media: fix use-after-free in cdev_put() when app exits after driver unbind" Sakari Ailus
2016-08-26 23:43 ` [RFC v3 03/21] Revert "[media] media-device: dynamically allocate struct media_devnode" Sakari Ailus
2016-08-26 23:43 ` [RFC v3 04/21] media: Remove useless curly braces and parentheses Sakari Ailus
2016-08-26 23:43 ` [RFC v3 05/21] media: devnode: Rename mdev argument as devnode Sakari Ailus
2016-08-26 23:43 ` [RFC v3 06/21] media device: Drop nop release callback Sakari Ailus
2016-08-26 23:43 ` [RFC v3 07/21] media-device: Make devnode.dev->kobj parent of devnode.cdev Sakari Ailus
2016-08-26 23:43 ` [RFC v3 08/21] media: Enable allocating the media device dynamically Sakari Ailus
2016-08-26 23:43 ` [RFC v3 09/21] media: Split initialising and adding media devnode Sakari Ailus
2016-08-26 23:43 ` [RFC v3 10/21] media: Shuffle functions around Sakari Ailus
2016-08-26 23:43 ` [RFC v3 11/21] media device: Refcount the media device Sakari Ailus
2016-08-26 23:43 ` [RFC v3 12/21] media device: Initialise media devnode in media_device_init() Sakari Ailus
2016-08-26 23:43 ` [RFC v3 13/21] media device: Deprecate media_device_{init,cleanup}() for drivers Sakari Ailus
2016-08-26 23:43 ` [RFC v3 14/21] media device: Get the media device driver's device Sakari Ailus
2016-08-26 23:43 ` [RFC v3 15/21] media: Provide a way to the driver to set a private pointer Sakari Ailus
2016-08-26 23:43 ` [RFC v3 16/21] media: Add release callback for media device Sakari Ailus
2016-08-26 23:43 ` [RFC v3 17/21] v4l: Acquire a reference to the media device for every video device Sakari Ailus
2016-08-26 23:43 ` [RFC v3 18/21] media-device: Postpone graph object removal until free Sakari Ailus
2016-08-26 23:43 ` [RFC v3 19/21] omap3isp: Allocate the media device dynamically Sakari Ailus
2016-08-26 23:43 ` [RFC v3 20/21] omap3isp: Release the isp device struct by media device callback Sakari Ailus
2016-08-26 23:43 ` [RFC v3 21/21] omap3isp: Don't rely on devm for memory resource management Sakari Ailus
2016-12-15 11:23   ` Laurent Pinchart
2016-12-15 11:39     ` Sakari Ailus
2016-12-15 11:42       ` Laurent Pinchart
2016-12-15 11:45         ` Sakari Ailus
2016-12-15 11:57           ` Laurent Pinchart
2016-12-15 19:17             ` Shuah Khan
2016-12-16 13:32     ` Sakari Ailus
2016-12-16 14:39       ` Shuah Khan
2016-11-07 20:16 ` [RFC v3 00/21] Make use of kref in media device, grab references as needed Shuah Khan
2016-11-08  8:19   ` Sakari Ailus
2016-11-09 16:49     ` Shuah Khan
2016-11-09 17:00       ` Shuah Khan
2016-11-09 17:46         ` Mauro Carvalho Chehab
2016-11-14 13:27           ` Sakari Ailus
2016-11-22 17:44             ` Mauro Carvalho Chehab
2016-11-22 18:13               ` Hans Verkuil
2016-11-22 18:41                 ` Shuah Khan
2016-11-22 22:56               ` Shuah Khan
2016-11-28 10:45               ` Sakari Ailus
2016-11-29 11:13                 ` Mauro Carvalho Chehab
2016-12-13 10:53                   ` Sakari Ailus
2016-12-13 12:24                     ` Mauro Carvalho Chehab
2016-12-13 22:23                       ` Shuah Khan
2016-12-15 10:39                         ` Laurent Pinchart
2016-12-15 14:56                           ` Shuah Khan
2016-12-16 16:58                             ` Laurent Pinchart
2016-12-15 11:30                       ` Sakari Ailus
2016-12-15 12:56                         ` Laurent Pinchart
2016-12-15 14:03                           ` Hans Verkuil
2016-12-15 14:32                             ` Mauro Carvalho Chehab
2016-12-15 14:45                               ` Hans Verkuil
2016-12-15 15:45                                 ` Mauro Carvalho Chehab
2016-12-15 16:07                                   ` Hans Verkuil
2016-12-16 16:47                                   ` Laurent Pinchart
2016-12-16 16:43                               ` Laurent Pinchart
2016-12-15 14:45                             ` Shuah Khan
2016-12-15 15:26                               ` Hans Verkuil
2016-12-15 16:06                                 ` Shuah Khan
2016-12-15 16:28                                   ` Hans Verkuil
2016-12-15 17:09                                     ` Shuah Khan
2016-12-15 17:25                                       ` Mauro Carvalho Chehab
2016-12-15 17:51                                         ` Shuah Khan
2016-12-16 10:11                                           ` Hans Verkuil
2016-12-16 10:57                                             ` Mauro Carvalho Chehab
2016-12-16 11:27                                               ` Hans Verkuil
2016-12-16 12:00                                                 ` Mauro Carvalho Chehab
2016-12-16 14:45                                                   ` Hans Verkuil
2016-12-19  9:28                                                     ` Media summit in Feb? - Was: " Mauro Carvalho Chehab
2016-12-21  1:31                                                       ` Mauro Carvalho Chehab
2016-12-21 14:27                                                         ` Shuah Khan
2016-12-22 17:47                                                         ` Laurent Pinchart
2016-12-22 20:43                                                           ` Mauro Carvalho Chehab
2016-12-16 10:03                                       ` Hans Verkuil
2016-12-16 10:12                                         ` Mauro Carvalho Chehab
2016-12-23 18:13                                   ` Laurent Pinchart
2016-12-15 17:08                                 ` Mauro Carvalho Chehab
2016-12-23 17:55                                   ` Laurent Pinchart
2016-12-23 17:48                                 ` Laurent Pinchart
2016-12-23 17:27                               ` Laurent Pinchart
2016-12-16 15:07                             ` Sakari Ailus
2016-12-16 16:34                               ` Laurent Pinchart
2016-12-19  9:46                               ` Mauro Carvalho Chehab
2017-01-02  7:53                                 ` Sakari Ailus
2017-01-24 10:49                                   ` Mauro Carvalho Chehab
2017-01-25 11:02                                     ` Sakari Ailus
2017-01-26  9:10                                       ` Mauro Carvalho Chehab
2017-05-30 23:41                                         ` Shuah Khan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.