All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Synchronize DT overlay removal with devlink removals
@ 2024-02-29  8:39 Herve Codina
  2024-02-29  8:39 ` [PATCH v2 1/2] driver core: Introduce device_link_wait_removal() Herve Codina
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Herve Codina @ 2024-02-29  8:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand
  Cc: Lizhi Hou, Max Zhen, Sonal Santan, Stefano Stabellini,
	Jonathan Cameron, linux-kernel, devicetree, Allan Nielsen,
	Horatiu Vultur, Steen Hegelund, Luca Ceresoli, Nuno Sa,
	Thomas Petazzoni, Herve Codina

Hi,

In the following sequence:
  of_platform_depopulate(); /* Remove devices from a DT overlay node */
  of_overlay_remove(); /* Remove the DT overlay node itself */

Some warnings are raised by __of_changeset_entry_destroy() which  was
called from of_overlay_remove():
  ERROR: memory leak, expected refcount 1 instead of 2 ...

The issue is that, during the device devlink removals triggered from the
of_platform_depopulate(), jobs are put in a workqueue.
These jobs drop the reference to the devices. When a device is no more
referenced (refcount == 0), it is released and the reference to its
of_node is dropped by a call to of_node_put().
These operations are fully correct except that, because of the
workqueue, they are done asynchronously with respect to function calls.

In the sequence provided, the jobs are run too late, after the call to
__of_changeset_entry_destroy() and so a missing of_node_put() call is
detected by __of_changeset_entry_destroy().

This series fixes this issue introducing device_link_wait_removal() in
order to wait for the end of jobs execution (patch 1) and using this
function to synchronize the overlay removal with the end of jobs
execution (patch 2).

Compared to the previous iteration:
  https://lore.kernel.org/linux-kernel/20231130174126.688486-1-herve.codina@bootlin.com/
this v2 series mainly:
- Renames the workqueue used.
- Calls device_link_wait_removal() a bit later to handle cases reported
  by Luca [1] and Nuno [2].
  [1]: https://lore.kernel.org/all/20231220181627.341e8789@booty/
  [2]: https://lore.kernel.org/all/20240205-fix-device-links-overlays-v2-2-5344f8c79d57@analog.com/

Best regards,
Hervé

Changes v1 -> v2
  - Patch 1
    Rename the workqueue to 'device_link_wq'
    Add 'Fixes' tag and Cc stable

  - Patch 2
    Add device.h inclusion.
    Call device_link_wait_removal() later in the overlay removal
    sequence (i.e. in free_overlay_changeset() function).
    Drop of_mutex lock while calling device_link_wait_removal().
    Add	'Fixes'	tag and Cc stable

Herve Codina (2):
  driver core: Introduce device_link_wait_removal()
  of: overlay: Synchronize of_overlay_remove() with the devlink removals

 drivers/base/core.c    | 26 +++++++++++++++++++++++---
 drivers/of/overlay.c   |  9 ++++++++-
 include/linux/device.h |  1 +
 3 files changed, 32 insertions(+), 4 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v2 1/2] driver core: Introduce device_link_wait_removal()
  2024-02-29  8:39 [PATCH v2 0/2] Synchronize DT overlay removal with devlink removals Herve Codina
@ 2024-02-29  8:39 ` Herve Codina
  2024-02-29  9:43   ` Nuno Sá
  2024-02-29  8:39 ` [PATCH v2 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Herve Codina
  2024-02-29 10:55 ` [PATCH v2 0/2] Synchronize DT overlay removal with " Herve Codina
  2 siblings, 1 reply; 9+ messages in thread
From: Herve Codina @ 2024-02-29  8:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand
  Cc: Lizhi Hou, Max Zhen, Sonal Santan, Stefano Stabellini,
	Jonathan Cameron, linux-kernel, devicetree, Allan Nielsen,
	Horatiu Vultur, Steen Hegelund, Luca Ceresoli, Nuno Sa,
	Thomas Petazzoni, Herve Codina, stable

The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
introduces a workqueue to release the consumer and supplier devices used
in the devlink.
In the job queued, devices are release and in turn, when all the
references to these devices are dropped, the release function of the
device itself is called.

Nothing is present to provide some synchronisation with this workqueue
in order to ensure that all ongoing releasing operations are done and
so, some other operations can be started safely.

For instance, in the following sequence:
  1) of_platform_depopulate()
  2) of_overlay_remove()

During the step 1, devices are released and related devlinks are removed
(jobs pushed in the workqueue).
During the step 2, OF nodes are destroyed but, without any
synchronisation with devlink removal jobs, of_overlay_remove() can raise
warnings related to missing of_node_put():
  ERROR: memory leak, expected refcount 1 instead of 2

Indeed, the missing of_node_put() call is going to be done, too late,
from the workqueue job execution.

Introduce device_link_wait_removal() to offer a way to synchronize
operations waiting for the end of devlink removals (i.e. end of
workqueue jobs).
Also, as a flushing operation is done on the workqueue, the workqueue
used is moved from a system-wide workqueue to a local one.

Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
Cc: stable@vger.kernel.org
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
---
 drivers/base/core.c    | 26 +++++++++++++++++++++++---
 include/linux/device.h |  1 +
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index d5f4e4aac09b..80d9430856a8 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
 static void __fw_devlink_link_to_consumers(struct device *dev);
 static bool fw_devlink_drv_reg_done;
 static bool fw_devlink_best_effort;
+static struct workqueue_struct *device_link_wq;
 
 /**
  * __fwnode_link_add - Create a link between two fwnode_handles.
@@ -532,12 +533,26 @@ static void devlink_dev_release(struct device *dev)
 	/*
 	 * It may take a while to complete this work because of the SRCU
 	 * synchronization in device_link_release_fn() and if the consumer or
-	 * supplier devices get deleted when it runs, so put it into the "long"
-	 * workqueue.
+	 * supplier devices get deleted when it runs, so put it into the
+	 * dedicated workqueue.
 	 */
-	queue_work(system_long_wq, &link->rm_work);
+	queue_work(device_link_wq, &link->rm_work);
 }
 
+/**
+ * device_link_wait_removal - Wait for ongoing devlink removal jobs to terminate
+ */
+void device_link_wait_removal(void)
+{
+	/*
+	 * devlink removal jobs are queued in the dedicated work queue.
+	 * To be sure that all removal jobs are terminated, ensure that any
+	 * scheduled work has run to completion.
+	 */
+	drain_workqueue(device_link_wq);
+}
+EXPORT_SYMBOL_GPL(device_link_wait_removal);
+
 static struct class devlink_class = {
 	.name = "devlink",
 	.dev_groups = devlink_groups,
@@ -4099,9 +4114,14 @@ int __init devices_init(void)
 	sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
 	if (!sysfs_dev_char_kobj)
 		goto char_kobj_err;
+	device_link_wq = alloc_workqueue("device_link_wq", 0, 0);
+	if (!device_link_wq)
+		goto wq_err;
 
 	return 0;
 
+ wq_err:
+	kobject_put(sysfs_dev_char_kobj);
  char_kobj_err:
 	kobject_put(sysfs_dev_block_kobj);
  block_kobj_err:
diff --git a/include/linux/device.h b/include/linux/device.h
index 1795121dee9a..d7d8305a72e8 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1249,6 +1249,7 @@ void device_link_del(struct device_link *link);
 void device_link_remove(void *consumer, struct device *supplier);
 void device_links_supplier_sync_state_pause(void);
 void device_links_supplier_sync_state_resume(void);
+void device_link_wait_removal(void);
 
 /* Create alias, so I can be autoloaded. */
 #define MODULE_ALIAS_CHARDEV(major,minor) \
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals
  2024-02-29  8:39 [PATCH v2 0/2] Synchronize DT overlay removal with devlink removals Herve Codina
  2024-02-29  8:39 ` [PATCH v2 1/2] driver core: Introduce device_link_wait_removal() Herve Codina
@ 2024-02-29  8:39 ` Herve Codina
  2024-02-29  9:47   ` Nuno Sá
  2024-02-29  9:50   ` Nuno Sá
  2024-02-29 10:55 ` [PATCH v2 0/2] Synchronize DT overlay removal with " Herve Codina
  2 siblings, 2 replies; 9+ messages in thread
From: Herve Codina @ 2024-02-29  8:39 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand
  Cc: Lizhi Hou, Max Zhen, Sonal Santan, Stefano Stabellini,
	Jonathan Cameron, linux-kernel, devicetree, Allan Nielsen,
	Horatiu Vultur, Steen Hegelund, Luca Ceresoli, Nuno Sa,
	Thomas Petazzoni, Herve Codina, stable

In the following sequence:
  1) of_platform_depopulate()
  2) of_overlay_remove()

During the step 1, devices are destroyed and devlinks are removed.
During the step 2, OF nodes are destroyed but
__of_changeset_entry_destroy() can raise warnings related to missing
of_node_put():
  ERROR: memory leak, expected refcount 1 instead of 2 ...

Indeed, during the devlink removals performed at step 1, the removal
itself releasing the device (and the attached of_node) is done by a job
queued in a workqueue and so, it is done asynchronously with respect to
function calls.
When the warning is present, of_node_put() will be called but wrongly
too late from the workqueue job.

In order to be sure that any ongoing devlink removals are done before
the of_node destruction, synchronize the of_overlay_remove() with the
devlink removals.

Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
Cc: stable@vger.kernel.org
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
---
 drivers/of/overlay.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
index 2ae7e9d24a64..99659ae9fb28 100644
--- a/drivers/of/overlay.c
+++ b/drivers/of/overlay.c
@@ -853,6 +853,14 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs)
 {
 	int i;
 
+	/*
+	 * Wait for any ongoing device link removals before removing some of
+	 * nodes. Drop the global lock while waiting
+	 */
+	mutex_unlock(&of_mutex);
+	device_link_wait_removal();
+	mutex_lock(&of_mutex);
+
 	if (ovcs->cset.entries.next)
 		of_changeset_destroy(&ovcs->cset);
 
@@ -862,7 +870,6 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs)
 		ovcs->id = 0;
 	}
 
-
 	for (i = 0; i < ovcs->count; i++) {
 		of_node_put(ovcs->fragments[i].target);
 		of_node_put(ovcs->fragments[i].overlay);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/2] driver core: Introduce device_link_wait_removal()
  2024-02-29  8:39 ` [PATCH v2 1/2] driver core: Introduce device_link_wait_removal() Herve Codina
@ 2024-02-29  9:43   ` Nuno Sá
  0 siblings, 0 replies; 9+ messages in thread
From: Nuno Sá @ 2024-02-29  9:43 UTC (permalink / raw)
  To: Herve Codina, Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring,
	Frank Rowand
  Cc: Lizhi Hou, Max Zhen, Sonal Santan, Stefano Stabellini,
	Jonathan Cameron, linux-kernel, devicetree, Allan Nielsen,
	Horatiu Vultur, Steen Hegelund, Luca Ceresoli, Nuno Sa,
	Thomas Petazzoni, stable

Hi Herve,

Thanks for moving this forward... Couple of comment

On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote:
> The commit 80dd33cf72d1 ("drivers: base: Fix device link removal")
> introduces a workqueue to release the consumer and supplier devices used
> in the devlink.
> In the job queued, devices are release and in turn, when all the
> references to these devices are dropped, the release function of the
> device itself is called.
> 
> Nothing is present to provide some synchronisation with this workqueue
> in order to ensure that all ongoing releasing operations are done and
> so, some other operations can be started safely.
> 
> For instance, in the following sequence:
>   1) of_platform_depopulate()
>   2) of_overlay_remove()
> 
> During the step 1, devices are released and related devlinks are removed
> (jobs pushed in the workqueue).
> During the step 2, OF nodes are destroyed but, without any
> synchronisation with devlink removal jobs, of_overlay_remove() can raise
> warnings related to missing of_node_put():
>   ERROR: memory leak, expected refcount 1 instead of 2
> 
> Indeed, the missing of_node_put() call is going to be done, too late,
> from the workqueue job execution.
> 
> Introduce device_link_wait_removal() to offer a way to synchronize
> operations waiting for the end of devlink removals (i.e. end of
> workqueue jobs).
> Also, as a flushing operation is done on the workqueue, the workqueue
> used is moved from a system-wide workqueue to a local one.
> 
> Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> ---
>  drivers/base/core.c    | 26 +++++++++++++++++++++++---
>  include/linux/device.h |  1 +
>  2 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index d5f4e4aac09b..80d9430856a8 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -44,6 +44,7 @@ static bool fw_devlink_is_permissive(void);
>  static void __fw_devlink_link_to_consumers(struct device *dev);
>  static bool fw_devlink_drv_reg_done;
>  static bool fw_devlink_best_effort;
> +static struct workqueue_struct *device_link_wq;
>  
>  /**
>   * __fwnode_link_add - Create a link between two fwnode_handles.
> @@ -532,12 +533,26 @@ static void devlink_dev_release(struct device *dev)
>  	/*
>  	 * It may take a while to complete this work because of the SRCU
>  	 * synchronization in device_link_release_fn() and if the consumer or
> -	 * supplier devices get deleted when it runs, so put it into the
> "long"
> -	 * workqueue.
> +	 * supplier devices get deleted when it runs, so put it into the
> +	 * dedicated workqueue.
>  	 */
> -	queue_work(system_long_wq, &link->rm_work);
> +	queue_work(device_link_wq, &link->rm_work);
>  }
>  
> +/**
> + * device_link_wait_removal - Wait for ongoing devlink removal jobs to
> terminate
> + */
> +void device_link_wait_removal(void)
> +{
> +	/*
> +	 * devlink removal jobs are queued in the dedicated work queue.
> +	 * To be sure that all removal jobs are terminated, ensure that any
> +	 * scheduled work has run to completion.
> +	 */
> +	drain_workqueue(device_link_wq);

I'm still not convinced we can have a recursive call into devlinks removal so I
do think flush_workqueue() is enough. I will defer to Saravana though...

> +}
> +EXPORT_SYMBOL_GPL(device_link_wait_removal);
> +
>  static struct class devlink_class = {
>  	.name = "devlink",
>  	.dev_groups = devlink_groups,
> @@ -4099,9 +4114,14 @@ int __init devices_init(void)
>  	sysfs_dev_char_kobj = kobject_create_and_add("char", dev_kobj);
>  	if (!sysfs_dev_char_kobj)
>  		goto char_kobj_err;
> +	device_link_wq = alloc_workqueue("device_link_wq", 0, 0);
> +	if (!device_link_wq)
> +		goto wq_err;

I still think this makes more sense in devlink_class_init() as this really
device link specific. Moreover, as I said to Saravana, we need to "convince"
Rafael about this as he (in my series) did not agreed with erroring out in case
we fail to allocate the queue.

Rafael?

- Nuno Sá


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals
  2024-02-29  8:39 ` [PATCH v2 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Herve Codina
@ 2024-02-29  9:47   ` Nuno Sá
  2024-02-29  9:50   ` Nuno Sá
  1 sibling, 0 replies; 9+ messages in thread
From: Nuno Sá @ 2024-02-29  9:47 UTC (permalink / raw)
  To: Herve Codina, Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring,
	Frank Rowand
  Cc: Lizhi Hou, Max Zhen, Sonal Santan, Stefano Stabellini,
	Jonathan Cameron, linux-kernel, devicetree, Allan Nielsen,
	Horatiu Vultur, Steen Hegelund, Luca Ceresoli, Nuno Sa,
	Thomas Petazzoni, stable

On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote:
> In the following sequence:
>   1) of_platform_depopulate()
>   2) of_overlay_remove()
> 
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
>   ERROR: memory leak, expected refcount 1 instead of 2 ...
> 
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
> 
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_overlay_remove() with the
> devlink removals.
> 
> Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> ---
>  drivers/of/overlay.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index 2ae7e9d24a64..99659ae9fb28 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c
> @@ -853,6 +853,14 @@ static void free_overlay_changeset(struct
> overlay_changeset *ovcs)
>  {
>  	int i;
>  
> +	/*
> +	 * Wait for any ongoing device link removals before removing some of
> +	 * nodes. Drop the global lock while waiting
> +	 */
> +	mutex_unlock(&of_mutex);
> +	device_link_wait_removal();
> +	mutex_lock(&of_mutex);

I'm still not convinced we need to drop the lock. What happens if someone else
grabs the lock while we are in device_link_wait_removal()? Can we guarantee that
we can't screw things badly?

The question is, do you have a system/use case where you can really see the
deadlock happening? Until I see one, I'm very skeptical about this. And if we
have one, I'm not really sure this is also the right solution for it.

- Nuno Sá


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals
  2024-02-29  8:39 ` [PATCH v2 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Herve Codina
  2024-02-29  9:47   ` Nuno Sá
@ 2024-02-29  9:50   ` Nuno Sá
  2024-02-29 10:14     ` Herve Codina
  1 sibling, 1 reply; 9+ messages in thread
From: Nuno Sá @ 2024-02-29  9:50 UTC (permalink / raw)
  To: Herve Codina, Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring,
	Frank Rowand
  Cc: Lizhi Hou, Max Zhen, Sonal Santan, Stefano Stabellini,
	Jonathan Cameron, linux-kernel, devicetree, Allan Nielsen,
	Horatiu Vultur, Steen Hegelund, Luca Ceresoli, Nuno Sa,
	Thomas Petazzoni, stable

On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote:
> In the following sequence:
>   1) of_platform_depopulate()
>   2) of_overlay_remove()
> 
> During the step 1, devices are destroyed and devlinks are removed.
> During the step 2, OF nodes are destroyed but
> __of_changeset_entry_destroy() can raise warnings related to missing
> of_node_put():
>   ERROR: memory leak, expected refcount 1 instead of 2 ...
> 
> Indeed, during the devlink removals performed at step 1, the removal
> itself releasing the device (and the attached of_node) is done by a job
> queued in a workqueue and so, it is done asynchronously with respect to
> function calls.
> When the warning is present, of_node_put() will be called but wrongly
> too late from the workqueue job.
> 
> In order to be sure that any ongoing devlink removals are done before
> the of_node destruction, synchronize the of_overlay_remove() with the
> devlink removals.
> 
> Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> Cc: stable@vger.kernel.org
> Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> ---
>  drivers/of/overlay.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> index 2ae7e9d24a64..99659ae9fb28 100644
> --- a/drivers/of/overlay.c
> +++ b/drivers/of/overlay.c

In the cover, you mention device.h inclusion but I'm not seeing it? This is
clearly up to the DT maintainers to decide but, IMHO, I would very much prefer
to see fwnode.h included in here rather than directly device.h (so yeah,
renaming the function to fwnode_*). But yeah, I might be biased by own series :)

- Nuno Sá



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals
  2024-02-29  9:50   ` Nuno Sá
@ 2024-02-29 10:14     ` Herve Codina
  2024-02-29 10:25       ` Nuno Sá
  0 siblings, 1 reply; 9+ messages in thread
From: Herve Codina @ 2024-02-29 10:14 UTC (permalink / raw)
  To: Nuno Sá
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Lizhi Hou, Max Zhen, Sonal Santan, Stefano Stabellini,
	Jonathan Cameron, linux-kernel, devicetree, Allan Nielsen,
	Horatiu Vultur, Steen Hegelund, Luca Ceresoli, Nuno Sa,
	Thomas Petazzoni, stable

On Thu, 29 Feb 2024 10:50:21 +0100
Nuno Sá <noname.nuno@gmail.com> wrote:

> On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote:
> > In the following sequence:
> >   1) of_platform_depopulate()
> >   2) of_overlay_remove()
> > 
> > During the step 1, devices are destroyed and devlinks are removed.
> > During the step 2, OF nodes are destroyed but
> > __of_changeset_entry_destroy() can raise warnings related to missing
> > of_node_put():
> >   ERROR: memory leak, expected refcount 1 instead of 2 ...
> > 
> > Indeed, during the devlink removals performed at step 1, the removal
> > itself releasing the device (and the attached of_node) is done by a job
> > queued in a workqueue and so, it is done asynchronously with respect to
> > function calls.
> > When the warning is present, of_node_put() will be called but wrongly
> > too late from the workqueue job.
> > 
> > In order to be sure that any ongoing devlink removals are done before
> > the of_node destruction, synchronize the of_overlay_remove() with the
> > devlink removals.
> > 
> > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> > ---
> >  drivers/of/overlay.c | 9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > index 2ae7e9d24a64..99659ae9fb28 100644
> > --- a/drivers/of/overlay.c
> > +++ b/drivers/of/overlay.c  
> 
> In the cover, you mention device.h inclusion but I'm not seeing it? This is
> clearly up to the DT maintainers to decide but, IMHO, I would very much prefer
> to see fwnode.h included in here rather than directly device.h (so yeah,
> renaming the function to fwnode_*). But yeah, I might be biased by own series :)
> 

Damned. I missed device.h in this patch.
Without this one, the patch do not compile :(

A fixup commit I missed to squash before sending.

A v3 is planned to add this device.h.

Nuno, do you prefer I wait few days before sending this v3 waiting for more replies
or I send it right now and you re-do your comment on the v3 ?

I would really prefer to send it now :)

Sorry about my mistake.
Best regards,
Hervé

-- 
Hervé Codina, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals
  2024-02-29 10:14     ` Herve Codina
@ 2024-02-29 10:25       ` Nuno Sá
  0 siblings, 0 replies; 9+ messages in thread
From: Nuno Sá @ 2024-02-29 10:25 UTC (permalink / raw)
  To: Herve Codina
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Lizhi Hou, Max Zhen, Sonal Santan, Stefano Stabellini,
	Jonathan Cameron, linux-kernel, devicetree, Allan Nielsen,
	Horatiu Vultur, Steen Hegelund, Luca Ceresoli, Nuno Sa,
	Thomas Petazzoni, stable

On Thu, 2024-02-29 at 11:14 +0100, Herve Codina wrote:
> On Thu, 29 Feb 2024 10:50:21 +0100
> Nuno Sá <noname.nuno@gmail.com> wrote:
> 
> > On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote:
> > > In the following sequence:
> > >   1) of_platform_depopulate()
> > >   2) of_overlay_remove()
> > > 
> > > During the step 1, devices are destroyed and devlinks are removed.
> > > During the step 2, OF nodes are destroyed but
> > > __of_changeset_entry_destroy() can raise warnings related to missing
> > > of_node_put():
> > >   ERROR: memory leak, expected refcount 1 instead of 2 ...
> > > 
> > > Indeed, during the devlink removals performed at step 1, the removal
> > > itself releasing the device (and the attached of_node) is done by a job
> > > queued in a workqueue and so, it is done asynchronously with respect to
> > > function calls.
> > > When the warning is present, of_node_put() will be called but wrongly
> > > too late from the workqueue job.
> > > 
> > > In order to be sure that any ongoing devlink removals are done before
> > > the of_node destruction, synchronize the of_overlay_remove() with the
> > > devlink removals.
> > > 
> > > Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> > > ---
> > >  drivers/of/overlay.c | 9 ++++++++-
> > >  1 file changed, 8 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c
> > > index 2ae7e9d24a64..99659ae9fb28 100644
> > > --- a/drivers/of/overlay.c
> > > +++ b/drivers/of/overlay.c  
> > 
> > In the cover, you mention device.h inclusion but I'm not seeing it? This is
> > clearly up to the DT maintainers to decide but, IMHO, I would very much
> > prefer
> > to see fwnode.h included in here rather than directly device.h (so yeah,
> > renaming the function to fwnode_*). But yeah, I might be biased by own
> > series :)
> > 
> 
> Damned. I missed device.h in this patch.
> Without this one, the patch do not compile :(
> 
> A fixup commit I missed to squash before sending.
> 
> A v3 is planned to add this device.h.
> 
> Nuno, do you prefer I wait few days before sending this v3 waiting for more
> replies
> or I send it right now and you re-do your comment on the v3 ?
> 
> I would really prefer to send it now :)
> 

Typically maintainers don't like much of re-spinning versions too fast. That
said, up to you :). I can copy paste my comments in v3.

- Nuno Sá


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 0/2] Synchronize DT overlay removal with devlink removals
  2024-02-29  8:39 [PATCH v2 0/2] Synchronize DT overlay removal with devlink removals Herve Codina
  2024-02-29  8:39 ` [PATCH v2 1/2] driver core: Introduce device_link_wait_removal() Herve Codina
  2024-02-29  8:39 ` [PATCH v2 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Herve Codina
@ 2024-02-29 10:55 ` Herve Codina
  2 siblings, 0 replies; 9+ messages in thread
From: Herve Codina @ 2024-02-29 10:55 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand
  Cc: Lizhi Hou, Max Zhen, Sonal Santan, Stefano Stabellini,
	Jonathan Cameron, linux-kernel, devicetree, Allan Nielsen,
	Horatiu Vultur, Steen Hegelund, Luca Ceresoli, Nuno Sa,
	Thomas Petazzoni

Hi All,

I did a mistake in this series.
As noted by Nuno, the device.h include is missing in patch 2 and so
the patch 2 doesn't compile :(

A v3 is already sent fixing my missing device.h mistake.
  https://lore.kernel.org/all/20240229105204.720717-1-herve.codina@bootlin.com/

Sorry for this error.
Best regards,
Hervé

On Thu, 29 Feb 2024 09:39:40 +0100
Herve Codina <herve.codina@bootlin.com> wrote:

> Hi,
> 
> In the following sequence:
>   of_platform_depopulate(); /* Remove devices from a DT overlay node */
>   of_overlay_remove(); /* Remove the DT overlay node itself */
> 
> Some warnings are raised by __of_changeset_entry_destroy() which  was
> called from of_overlay_remove():
>   ERROR: memory leak, expected refcount 1 instead of 2 ...
> 
> The issue is that, during the device devlink removals triggered from the
> of_platform_depopulate(), jobs are put in a workqueue.
> These jobs drop the reference to the devices. When a device is no more
> referenced (refcount == 0), it is released and the reference to its
> of_node is dropped by a call to of_node_put().
> These operations are fully correct except that, because of the
> workqueue, they are done asynchronously with respect to function calls.
> 
> In the sequence provided, the jobs are run too late, after the call to
> __of_changeset_entry_destroy() and so a missing of_node_put() call is
> detected by __of_changeset_entry_destroy().
> 
> This series fixes this issue introducing device_link_wait_removal() in
> order to wait for the end of jobs execution (patch 1) and using this
> function to synchronize the overlay removal with the end of jobs
> execution (patch 2).
> 
> Compared to the previous iteration:
>   https://lore.kernel.org/linux-kernel/20231130174126.688486-1-herve.codina@bootlin.com/
> this v2 series mainly:
> - Renames the workqueue used.
> - Calls device_link_wait_removal() a bit later to handle cases reported
>   by Luca [1] and Nuno [2].
>   [1]: https://lore.kernel.org/all/20231220181627.341e8789@booty/
>   [2]: https://lore.kernel.org/all/20240205-fix-device-links-overlays-v2-2-5344f8c79d57@analog.com/
> 
> Best regards,
> Hervé
> 
> Changes v1 -> v2
>   - Patch 1
>     Rename the workqueue to 'device_link_wq'
>     Add 'Fixes' tag and Cc stable
> 
>   - Patch 2
>     Add device.h inclusion.
>     Call device_link_wait_removal() later in the overlay removal
>     sequence (i.e. in free_overlay_changeset() function).
>     Drop of_mutex lock while calling device_link_wait_removal().
>     Add	'Fixes'	tag and Cc stable
> 
> Herve Codina (2):
>   driver core: Introduce device_link_wait_removal()
>   of: overlay: Synchronize of_overlay_remove() with the devlink removals
> 
>  drivers/base/core.c    | 26 +++++++++++++++++++++++---
>  drivers/of/overlay.c   |  9 ++++++++-
>  include/linux/device.h |  1 +
>  3 files changed, 32 insertions(+), 4 deletions(-)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-29 10:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-29  8:39 [PATCH v2 0/2] Synchronize DT overlay removal with devlink removals Herve Codina
2024-02-29  8:39 ` [PATCH v2 1/2] driver core: Introduce device_link_wait_removal() Herve Codina
2024-02-29  9:43   ` Nuno Sá
2024-02-29  8:39 ` [PATCH v2 2/2] of: overlay: Synchronize of_overlay_remove() with the devlink removals Herve Codina
2024-02-29  9:47   ` Nuno Sá
2024-02-29  9:50   ` Nuno Sá
2024-02-29 10:14     ` Herve Codina
2024-02-29 10:25       ` Nuno Sá
2024-02-29 10:55 ` [PATCH v2 0/2] Synchronize DT overlay removal with " Herve Codina

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.